Understanding Crashes and Crash Logs

Session 414 WWDC 2018

Sudden app crashes are a source of bad user experience and app review rejections. Learn how crash logs can be analyzed, what information they contain and how to diagnose the causes of crashes, including hard-to-reproduce memory corruptions and multithreading issues.

[ Music ]

Oh yeah, good morning, thank you for coming.

I'm going to be joined by some very bright people in a minute.

We have some very cool stuff prepared for you today.

So let me start first by saying if you never write code that crashes this session is not for you.

This is a talk for the rest of us, for those of us who make mistakes.

Today we're going to talk about the tools and techniques that you can use to stay on top of your crashes if they're affecting your users.

In particular I'm going to cover the fundamentals of crashes, why do they happen, what do they look like.

Then I'm going to show you some of the tools that we have to access crash logs when they're occurring in the wild.

Then Greg is going to come up and give you some more detail on how to read the content of a crash log.

And then he's going to go in depth in how to read a tricky memory issue with a crash log.

And then Kuba will show you how to catch threading races early, these threading races lead to crashes, and these crashes are very hard to reproduce.

So first we should define it, what is a crash?

A crash is a sudden termination of your app when it attempts to do something that is not allowed.

So what's not allowed?

Well, sometimes it's impossible for the CPU to execute code, the CPU won't divide by zero.

Or sometimes it's the operating system that's enforcing some policy.

The operating system will preserve the user experience by killing your app if it's taking too long to launch or it's using too much memory.

Sometimes the programming language you're using is trying to prevent a failure and will trigger a crash.

A Swift array and NSArray will halt your process if you attempt to go outside of your array bounds.

Or sometimes it's you, the developer that's trying to prevent a failure.

You may have an API where you assert is a parameter is non nil and that's perfectly all right.

So stop me if you've seen this before, this is what it looks like in Xcode in the debugger when the debugger is attached to your app and has paused the process just before your app is terminated.

And let's take a closer look at this backtrace on the left.

You can see here how the app was started by the operating system and while we're paused we can see how the main function was called and functions are calling other functions within your app and eventually we get to this point where we hit a point where in your code where the only option is to crash, something has gone wrong.

And ultimately the debugger receives a signal that this app is about to crash and pauses the app.

Now sometimes you're not always conveniently attached with the debugger like we are here.

When you're not attached with the debugger the operating system will capture this backtrace in plaintext and save it out to disk in a human readable crash log.

Now the truth is one a release build of your app crashes the log doesn't actually look this pretty.

What's actually written out is a list of binary names and addresses.

This is a snippet from a unsymbolicated crash log.

Now thankfully Xcode takes care of symbolicating crash logs so that what you'll see are those pretty function names, file names and line numbers that you're familiar with.

So there are a number of ways to access these crash logs.

I'd like first to talk about how you can access those crash logs from your beta testers on TestFlight and your customers on the App Store.

You can download these crash logs using a feature in Xcode called the Crashes Organizer, this is what it looks like a beautiful dark mode.

And let's take a tour through this UI.

On the left, you can see all of your apps distributed over TestFlight and the App Store and we support all of our platforms including watchOS and app extensions.

On the right, for a given crash point you can see the number of unique devices affected and we group crash logs by similar issue, by similar crash point and we rank them in the source list by the number of unique devices affected.

And you can page through a sampling of the individual logs down below.

And when you click on this button you can open the crash log in your project, in the debug navigator and see it alongside your source code.

This is very cool if you haven't seen it before, we'll see that in a bit.

And in the detailed view of course we show you a fully symbolicated backtrace with the crash point highlighted.

So now that we have the lay of the land let's play with it.

So I have Xcode open here and I'm going to open up the Organizer window and you can see that I've selected the Crashes tab, that's the second tab here and I've selected this chocolate chip app that Kuba and I are working on.

And I've uploaded this build to TestFlight and you can see I'm looking at build 5 right now and a number of testers are reporting crashes, so that's not too good.

But I've worked on a couple of these crashes you can see, but this first one I haven't worked on yet so let's try to resolve that.

So this is affecting 242 devices and I can see the backtrace that was captured from the time that the app crashed and the crash point highlighted.

Now I'm not quite sure what's going on yet but I bet if I open this crash log in our source code I can reason about what's going on.

So I'm going to click the Open in Project button, select the project that matches build 5 of my app and what I do, what you can see is that this crash log has opened up in the debug navigator as if this app has just crashed.

And you can see here that we're stopped on this fatal error.

So a good question to ask myself is, is this a valid use of fatal error, I only want to be crashing if it's absolutely necessary.

So I can see that this is an initializer of an enum from an int and that enum can only be zero or one, if it's not I'll hit this fatal error.

So I think that this makes sense, this should only be crashing if there's some kind of misuse by the programmer.

So I can see the caller of this initializer if I navigate up the call stack here and I can see it's this tableView delegate method.

This method is asking for a title for a header in a given section number.

So it must be that that section number is not zero or one.

So I think I have a better understanding of what's going on, but let's try to reproduce this issue in the app and see if we can learn more.

So I've hit play.

And Chocolate Chip is a recipe app I store all of my favorite recipes and I've been testing with this whipped cream recipe and you can see everything's fine, I can see my list of ingredients, my list of steps.

These are the two recipe sections, that's ingredients that's section zero and steps that's section one.

And if I click on another recipe we've hit a crash and what I can see is that we're stopped on the same fatal error and the backtrace looks very similar to the crash log that we've been looking at.

So that's a very good sign that we're looking at the same issue.

So I'm going to clear this crash log by hitting delete and let's look at this debug session.

So in this fatal error I can see that the message is nicely printing out that the section number is eight.

So that's the reason we're crashing it's not zero or one.

And what's occurring to me now is that this is my fault, when I implemented this class I implemented another delegate method called numberOfSections.

And numberOfSections is saying how many headers it should look for and what I'm returning here is the number of ingredients and the ingredients count is eight.

But believe it or not I have a clever way to solve this.

I know what I want to be returning is the number of cases in this recipe section enum and I know that in Swift 4.2 some new functionality was added by the Swift Open Source Community, thank you very much, a protocol called CaseIterable.

If I have my recipe section conformed to CaseIterable I can reimplement this number of sections to return the count of all the cases in that recipe section enum.

And that way what I'll be returning is two, I'll be returning the number of accurate sections.

This is going to work out just great.

So now if I check out this chocolate chip cookies recipe no crash, I see all of my ingredients and my steps.

I did a very good job, I'm very pleased with myself.

And I can go back to the Organizer and mark this issue as resolved, step away from the computer and get back to my baking.

[ Applause ]

Okay so you've just see how you can use the Crashes Organizer to download crash logs from TestFlight, open the log in your source code, and resolve an issue.

So what do you need to do to get started?

Very simple, your customers if they opt into sharing with third-party developers this just works, their crash logs are uploaded automatically.

All you need to do is sign into Xcode with your Apple ID.

When you upload your app you should include symbols so that you get server-side symbolication of your crash logs.

And open up the Organizer window to the Crashes tab to start viewing those crashes.

Okay so we have covered viewing crashes in the Organizer.

But if you're not distributing over TestFlight or the App Store there are a couple of other options for you.

There is the devices window.

When you have a device connected you can click this View Logs button and will show you all the logs that are saved on that device and these logs are symbolicated using the local symbol information on your Mac.

When you run your [inaudible] tests with Xcode, Xcode Server or Xcode Build the test results bundle will include any of the crash logs that come from your app that are written out during the execution of that test run and that's very convenient and these crash logs are also symbolicated.

You can use the Mac Console app to view any crash logs from your Mac or from the Simulator.

And on the device under Settings, Privacy, Analytics, Analytics Data you can see all of the logs that are saved to disk and your users can share a log directly from this screen.

Okay so to ensure that symbolication just works I have three important best practices for you.

Number one, if you use the Crashes Organizer upload your symbols with your app.

This is the default, this will ensure that server-side symbolication works, it's very easy.

Number two, be sure to save your app archives.

Your archive contains a copy of your debug symbols, your .

Xcode uses Spotlight to find these dSYMs and to perform local symbolication when it's necessary automatically.

And if you upload an app that contains bitcode you should use the Archives Organizer Download Debug Symbols button to download any dSYMs that come from a store-side bitcode compilation.

Okay we've covered all the tools that we offer for accessing crash logs when they occur in the field.

Now to give you an in-depth guide into reading the content of a crash log please give a warm welcome to the helping, friendly Greg Parker.

[ Applause ]

Thank you, Chris.

So we've just seen how you can use Xcode to find crashes and to examine them in the Xcode tools in the debugger.

But the crash log file contains more information, it contains much more information than just the stack trace.

It is frequently useful to look at that extra information in order to debug your problem.

So how can you get the full text of a crash log?

Here's our Xcode Organizer, if we bring up the contextual menu there's a Show In Finder button.

The Show In Finder button will bring up a text file which we can open up in the Console app or your favorite text editor, it looks something like this.

So what's in this file, let's take a look.

The top of the file starts with some summary information, this contains your app name, the version number, the operating system version it was running on, and the date and time of the crash.

Below that we have the reason for the crash.

This was the specific error, the specific signal that the operating system sent to kill the process.

We can also see some logging information, the application-specific information section.

This section will include console logs in some cases, if you have an unhandled exception it may include the exception backtrace.

This section is not always available, on iOS devices it is often hidden for personal privacy reasons.

But on Simulator to macOS this section can include information that is useful.

Below that we have the thread stacks, these were the backtraces of all the threads that were running at the time of the crash.

One of them is marked as the crash thread and we also have whatever other threads were running at the time that the process died.

Below that we have some low-level information, we have the register state of the thread that crashed and we have the binary images that were loaded into the process.

This is the application executable and all the other libraries.

And Xcode uses this for symbolication in order to look up the symbols, the files and line number information for the stack traces.

So that's the content of a crash log file.

So how do we debug this, how do we read this, what do we look at?

We start with the crash reason, the exception type.

In this case, the exception type is an EXC bad instruction exception, the SIGILL signal was the illegal instruction signal.

That means the CPU was trying to execute an instruction that does not exist or is invalid for some reason and that's why the process died.

We can also look at the thread that crashed, what code was running at the time of the crash.

Here we see the fatal error message function in the Swift runtime.

No guesses as to what the fatal error message function does.

The error message in this case is included in the application-specific information, so we can see what the Swift runtime printed as the process exited.

So let's take a closer look at that stack trace.

We saw the fatal error message function and that was called by a function in our code.

We have a recipe class, an image function being called and that function in turn called fatal error message as a result of some error.

Because this is a symbolicated stack trace with full debug information we have a file and line number in our code of what the crash was.

So we can take a look at that line, we can open up our project, this is RecipeImage.swift, line 26 was the one that was marked in the crash.

And those of you who are experienced Swift programmers will have a pretty good guess as to why this line might crash.

We have a force unwrap operator.

We have a function, The UIImage Constructor which returns an optional value.

If the optional value is nil the force unwrap operator will halt the process, generate a crash log and exit.

If we remember the application-specific information it included the error message that the Swift runtime prints as this error check fails and it says, unexpectedly found nil while unwrapping an optional value.

So that's good, that's consistent with the code.

We have a force unwrap operator on line 26, we have an error message in the crash log that says we were unwrapping an optional value.

This all makes sense as a consistent story for what caused this crash.

So a force unwrap failure is an example of a precondition or an assertion in the code.

Preconditions and assertions are error checks that deliberately stop the process when an error occurs.

Some examples of this are of course the force unwrap of an optional that we just saw.

The Swift runtime will assert that the optional is not nil and crash if it is.

We have out-of-bounds Swift.Array access is another example.

If you access an array and your index is out-of-bounds the Swift runtime will fail, interim check it will fail a precondition and halt the process.

Swift arithmetic overflow also contains assertions, if you're adding two numbers together and the result is too large for an integer variable there is a precondition for that, the process will halt.

Uncaught exceptions are frequently caused by preconditions in the code.

There are many error checks where if the precondition fails it will throw an exception and if the exception is not caught the uncaught exception will cause a crash log.

And of course you can write assertions and preconditions in your own code if you have errors that you want to crash the process and generate a crash log in response.

Another example of a crash log is a case where the operating system kills your process from the outside.

An example of this are watchdog events, such as timeouts.

If your application takes too long to do something the operating system may detect that, kill the process of generate a particular crash log as it does so.

Environmental conditions can also cause the operating system to halt the process.

If the device is overheating the operating system will kill processes that are using too much CPU.

If the device is running out of memory the operating system will kill processes that are using lots of memory.

Another case is an invalid code signature.

The operating system enforces that code be signed and if a signature is invalid or the code is unsigned the operating system will kill the process and generate a particular type of crash log.

These terminations by the operating system can be found in the Devices window in Xcode, they can be found on the macOS console.

They do not always appear in the Xcode Organizer, so be careful of that.

In Apple's developer documentation we have a technote that describes many different signatures and structures of crash logs like these particular examples, what they look like, how you can recognize them, and it goes into much more detail than we have time for here.

But let's look at one example, here's another crash log file and again in order to understand the crash log we start with the crash reason.

In this case the crash reason is in EXC crash exception with the SIGKILL signal.

The SIGKILL signal is commonly used when the operating system wants to stop your process.

It sends the SIGKILL signal, the SIGKILL signal cannot be handled, it cannot be caught by your process, it will die in response to this signal.

We can also see the reason the operating system sent the signal in the crash log.

In this case we have a termination reason with a code ate bad food.

If you look in the developer technote I mentioned earlier, it will describe what ate bad food means.

And we have a textual description that says exhausted real clock time allowance of 19.95 seconds.

So if we combine this information with the information in the technote it will tell us our application took too long to launch.

We had 20 seconds to launch, we weren't done in that much time, the operating system killed the process.

Below we can see the crash logs at the time the process died.

It's possible that those crash logs were the code that took too long, maybe were stuck in an infinite loop, maybe were stuck waiting for network I/O and that's why we took too long to launch.

Or on the other hand, maybe this code is innocent and there's something earlier in the launch process that was too slow and that's why the process died.

So launch timeouts, how can you avoid them.

We want you to avoid them, launch timeouts are a common reason for app rejection during Apple's app review.

So how do you avoid this?

Well test your app of course.

But there's a catch, the launch timeout watchdog is disabled in the Simulator and it's disabled in the debugger so if you're doing all your testing in the Simulator and the debugger you'll never see watchdog timeouts.

So when you test your app be sure to test without the debugger.

If you're a macOS app launch your app in the Finder.

If you're an iOS app run in TestFlight or launch your app using the iOS App Launcher.

All of these will run your app outside the debugger and the launch timeouts will be enabled and enforced.

When you test, test on a real device of course, test outside of Simulators.

And test your devices with older hardware, whatever's the oldest hardware you want your app to support.

If you test only on newer hardware you might find that your app launches fast enough on a faster device, but slower devices might take too much time.

So let's talk about another class of errors, let's talk about memory errors and what they look like in crash logs.

When I say memory error I mean cases like reference counting of an object being over-released or using an object after it has been freed or a buffer overflow where you have a byte array or another C array and you access that array out-of-bounds.

So let's look at another crash log and spoiler alert this one is a memory error.

We start again with the exception type.

This is an EXC bad access exception, the SEG violation signal.

This is typically caused by a memory error.

The bad access exception means one of two things, either we were writing to memory that is read-only or we were reading from memory that does not exist at all.

Either of those will cause a bad access exception and the process will stop.

We see here the address that we were accessing at the time of the crash.

We can look at the stack trace, this is the function that performed the bad access of course.

This is the objc release function, it's part of the implementation of reference counting in Objective-C and some Swift objects.

So that again sounds like it's likely to have been a memory error that caused the bug.

So what code caused objc release?

We can look at the rest of the stack trace.

We have the object dispose function, this is a function in the Objective-C runtime that is used to deallocate objects.

The object dispose function called a function called ivar destroyer on one of our classes, our LoginViewController class.

The ivar destroyer function is part of Swift code, this is a function that cleans up the properties, that cleans up the ivar storage of an object as it is deallocated.

So that gives us part of the story as to what caused this crash.

We were deallocating an object of our login view controller class.

That class in its d init code was trying to clean up its properties and its ivars and while releasing one of those properties we crashed.

So that gives us a little bit of detail as to what went wrong.

Can we do better, is there more information in the crash log that will tell us more about what happened?

We can look at the invalid address itself, sometimes the actual bad address value will contain useful information in it.

And this particular bad address I can tell you looks like a use after free.

Well how do I know that?

Partly it's just long experience, when you read enough crash logs you start to learn patterns of what the bad values are.

This particular bad value looks very much like the address range for the malloc memory allocator which we happen to have available in this crash log.

So we have the address range that was used by the memory allocator and our invalid address looks like it's inside the malloc range, but it's been shifted by 4 bits, it's been rotated by 4 bits.

So it looks like it was a valid malloc address rotated.

That is a clue from the memory allocator itself.

Let me show you why that is.

Here's what our object looked like when it was still valid.

An object starts with an isa field, the isa field points to the object's class.

This is how Objective-C objects are laid out, this is how some swift objects are laid out.

So what does the objc release function do?

It reads the isa field and then dreferences the isa field so it can get to the class object and perform method lookups.

Ordinarily of course this works, this is how it's supposed to work.

What happens if our object has already been freed.

When the free function deletes an object it inserts it into a free list of other dead objects.

And it writes a free list pointer to the next object in the list where the isa field used to be.

With one slight twist, it does not write a pointer into that field it writes a rotated pointer into that field.

It wants to make sure that the value written there is not a valid memory address precisely so that bad use of the object will crash.

So when objc release goes to read the isa field it instead gets a rotated free list pointer.

When it dreferences the rotated free list pointer it crashes.

The memory allocator did that for us, it deliberately rotated that pointer to make sure we would crash if we tried to use it again.

So that is the signature we see in this crash log.

We had the invalid address field looks like a pointer in the malloc region but rotated the same way that malloc rotates its free list pointers.

So that's a strong sign that whatever object we are trying to release at this point in the code has already been deallocated, that's the memory error that occurred.

So that's additional detail of what our story is.

We have our object being deallocated, we're cleaning up its ivars, one of those ivars was already a freed object and that's what caused our crash.

Can we do better?

Can we figure out which object was being released by objc release?

Ordinarily, the function calling objc release would give us a clue as to what that was.

But the problem with the ivar destroyer function is it is a compiler generated function.

We didn't write a function called ivar destroyer, which means there's no filename or line number associated with this point in the crash, we don't know which of our properties was being deallocated at that point.

Here's our class, we have three properties in this class, we have a username, a database, and an array of views.

And at this point in our story we don't know which of these objects was the one being released, it could've been any of them.

Can we do better, can we work out which of those objects was the one being released from the information in the crash log because of course if we can't reproduce it in the debugger the crash log is all we have to go on?

In this case we can do better.

We can see a +42 where the file and line number would've been.

And the +42 is our clue because the +42 is an offset in the assembly code of the function.

We can disassemble the ivar destroyer function, look at the code and work out which property was being accessed at offset 42.

So how does that work?

We go into the debugger, into the debugger console.

We can run lldb at a terminal.

We can run lldb in the Xcode debug terminal.

The debugger has commands to import a crash log a little bit as if it crashed inside the debugger.

So we run this command to load the crash log interpretation commands and then we run another command to import our crash into the debugger.

So we want three things to make this work.

We want a copy of our crash log on the Mac, we also want a copy of our app, and a copy of the dSYM file that went with the app, all of them matching this crash log, all the same version of our app.

So this is why we want you to keep your app archives.

If we have all these files on the Mac, we run the crash log command, lldb uses Spotlight to find the matching executable, find the matching symbols, and load it into the debugger.

So we see the stack trace of our crash thread, we see the file and line number information where available, and now we can go to work.

Now we can find the address of the ivar destroyer function and disassemble it.

This shows us the assembly code of our function.

Now I don't have time to teach you how to read assembly code, but luckily for crash logs you don't actually have to be completely fluent in reading assembly code.

It often is sufficient to be able to skim the assembly code and get a general idea of what is going on, you don't have to understand every single instruction to be able to get useful information out of a crash log.

If we read through this function and we know about the call instruction and the jump instruction which are how you call functions, we can separate this code into three blocks.

We have this top section which is making a function call into a reference count releasing function and this one is releasing our username property.

The next region is releasing the database property.

And the next region is releasing the views property.

So we don't understand what all these instructions mean, but we have a general sense of what each region in the code is doing.

It's a little bit like having a line number associated with the code.

So now we go back to the information in our crash log.

We had the ivar destroyer function +42 calling objc release.

So there is instruction at +42, but there is one more catch.

The catch is in a stack trace the assembly level offset of most stack frames is the return address, it is the instruction after the function call.

So the instruction of the called objc release was the previous instruction, it's this instruction.

If we read this it's the call of objc release which is good, that's consistent with what we saw in the stack trace in the crash log which was a call to objc release at this offset.

And this release function is releasing the database property.

So now we have more detail about what our crash is doing.

We released the username property and that was successful.

We have not yet gotten to the views property, it might be valid, it might be invalid we don't know.

What we do know is we tried to release the database property and that object looked like it was already freed object based on the malloc free list pointer signature.

So that gives us a pretty good story of what caused this crash.

We were freeing a login view controller object and the database property was invalid.

So we haven't actually found a bug yet, none of this code is incorrect, the ivar destroyer function is not wrong, something else was buggy in our code.

But from the crash log we have been able to narrow down where our testing should go, where our attempts to reproduce the bug should go.

We should be exercising this class, we should be exercising the database field, we should be reading the code that uses that database object and try and find the bug.

So what have we just done?

We read a crash log from scratch.

We started with the crash reason, we read the exception type, we understand what the exception type means.

We examined the stack trace of the thread that crashed, understand what it was doing, and what the actual error that failed was.

And we looked for clues elsewhere in the crash log, in this case we used the bad address of the memory error and we used the disassembly of the crashing function.

So memory errors are a wide variety of crashes, there are many different signatures in crash logs that can be caused by memory errors.

Here are some examples for you.

Crashes in the Objective-C message send function or in the reference counting machinery or deallocation machinery in Swift and Objective-C are very commonly caused by memory errors.

Another common memory error symptom is an unrecognized selector exception.

These are often caused when you have an object of some type, a code that is using that object, and then the object gets deallocated and used again.

But instead of getting the malloc free list signature that we saw in that previous crash log, instead a new object is allocated at the same address where the old object used to be.

So when the code tries to use the old object, call a function on the old object, we have a different object of a different type at that same address and it doesn't recognize that function at all and we get an unrecognized selector exception.

One other common symptom of memory errors is an abort inside the memory allocator itself, inside the malloc and free functions.

This is an example of a precondition like we saw earlier, this is a precondition inside the memory allocator.

It may be identifying cases where the heap data structure of the malloc memory itself has been corrupted by a memory error and it halts the process and response.

Or it may be detecting incorrect use of the malloc APIs.

For example, if you free an object twice in a row the malloc allocator can sometimes recognize that as a double free and immediately halt the process.

So let me give you some final tips for analyzing crash logs in general and analyzing memory errors in particular.

In the crash we just looked at we spent most of our time looking at the code that crashed, the specific lines of code that crashed, and the thread that crashed.

It is important to look at the other code in your process related to the code that crashed.

For example, in this crash the ivar destroyer function is not wrong, that's not where the bug is.

The bug is somewhere else, some other code is incorrect.

You should also look at the stack traces in the crash log other than the crashed thread.

The crash log contains all the stack traces in the process and that can contain useful information and clues for helping you puzzle out what the process was doing.

Perhaps the other threads show more details about where in the application it was running, maybe it was performing network code and that's visible on one of the other stack traces.

Or perhaps there's a multithreading error and the other threads may provide clues as to what the thread race was.

You should also look at more than one crash log for a particular crash cause.

The Xcode Organizer helpfully groups your crashes based on where in the code they crashed.

And sometimes there will be multiple crashes at a same crash point, but some of the logs will contain more information than others.

For example, the malloc free list signature that we just saw that might appear in some crash logs but not in others.

So it is useful to browse through multiple crashes in the same crash set to see if some of them have information that is more useful than others.

In addition, the Organizer as a group crashes together will sometimes put crashes with different causes together in the same group.

There may be other threads or the backtrace of a crashed thread that identify to you, to a human eye you will recognize that there are multiple causes of this group of crashes even though the Xcode Organizer put them all together.

If you only looked at one crash log you might not even know that that second crash is occurring until you fix the first one and ship it and start getting crash logs from your users.

Once you've done some analysis of your crash, once you've narrowed down where in your process it might be occurring or which objects it might've been using you can use tools like the Address Sanitizer and the Zombies instrument to try and reproduce the crash.

Because even though we did a good job with the malloc free list crash log of narrowing down what happened it's a lot easier to debug crashes when they occur in a debugger, in a test with Sanitization error messages telling you what went on.

So a moment ago I mentioned looking up multiple stack traces, multiple thread stacks in order to diagnose multithreading errors.

And to talk more about debugging multithreading errors let me introduce Kuba.

[ Applause ]

Thank you.

As Greg mentioned, some memory corruptions can be caused by multithreading issues.

Multithreading bugs are often one of the hardest types of bugs to diagnose and reproduce.

They get especially hard to reproduce because they only cause crashes once in a while, so your code might seem to work fine in 99% of cases.

And these bugs can go unnoticed for a very long time.

Often monitoring bugs cause memory corruptions and what you'll see in crash logs looks just like memory corruptions as well and we've seen examples of those in the previous section.

So when you're dealing with crashes inside the malloc or free or retained counting operations those are typical symptoms of memory corruptions.

There's specific symptoms of multithreading bugs as well.

The crashing thread often contains, sorry the crash log often contains multiple threads that are executing related parts of your code.

So if a particular class or a method shows up in a crash log in multiple threads that's an indication of a potential multithreading bug.

The memory corruptions that are caused by multithreading issues are often very random.

So you might see crashes happening at slightly different lines of code or slightly different addresses.

And as Greg mentioned, you can see those show up as different crash points in Xcode even though it's the same bug.

And also the crashing thread might not actually be the culprit of the bug, so it's important to look at the stack traces of the other threads in a crash log.

So now let's take a closer look at an example of a multithreading bug and I will show you how we can diagnose such a bug using a tool called Thread Sanitizer which is part of Xcode.

So let's take another look at our cookies recipe app that Chris and I made and let's look at some more crash logs that we received from our users.

Let's focus on this second app crasher of our app, this one.

This crash log indicates that there's something wrong going on when we're doing with a class called LazyImageView, which is a class that I wrote and we'll look at it in just a second.

But let's try to understand more from the crash logs first.

Let's look at the entire stack of this thread which I can do by clicking this button right here that also shows all other threads.

And if we focus at the top most frames here we'll see that what's really going on is that the free function is calling abort which indicates a heap corruption, a type of memory error.

If we look at the stack traces of some other threads, like thread 5 here, we'll see that it's executing also some code inside LazyImageView.

Let's look at another crash in this group of crashes and we'll see that this a common theme in all these crash logs.

One thread is reporting a heap corruption when the free function is calling abort and a secondary thread is processing in a very related part of the code, actually in the same class, inside LazyImageView again.

That's most likely not a coincidence, I highly suspect that this is a multithreading issue.

So let's actually take a look at our LazyImageView class.

So I'll click this button right here to jump straight to this line of code and open it in our project.

And you can see the source of LazyImageView right here.

It's a subclass of UIImageView, but it has an extra feature because it loads images lazily and asynchronously.

And we can the logic for this right here in the initializer.

What we do is we dispatch a job to a background queue where we will create the image on a background thread and once that is done we will dispatch back to the main queue to actually display the image on the screen.

The crash log points us to this line of code here where we are accessing an image cache which we are using to make sure that we don't unnecessarily create the same image multiple times.

So maybe there is some bug in the way my cache is implemented.

Let's try to avoid guessing, instead I'll run the app in the Simulator and we will try to reproduce this crash.

And let me close the crash log session first.

All right, so this is our cookies recipe again and you'll notice that if I try to add a new recipe by clicking this + button right here we get asked to select an image for our new recipe.

So this controller which is on the screen right now displays all these images using LazyImageView.

So just showing it up on the screen and scrolling through the content already exercises all of the code inside LazyImageView, but I don't see any crashes.

Unfortunately, it's a very common problem with multithreading bugs, they are notoriously hard to reproduce.

So even when you repeatedly test the code that has such a bug you might not actually see a crush.

Let's try actually doing that, let's try closing and opening this controller a couple times and let's see if maybe eventually we will be lucky and we will be able to trigger this crash.

All right here we go, the debugger has stopped because it has crashed, but even when you actually do catch this crash in the debugger it doesn't really help us.

All the debugger provides is that this is some sort of XC bad excess but it does not what caused the corruption or why did it actually happen.

Luckily Xcode has a tool that's perfect just for this situation and it's called Thread Sanitizer and that's what I'm going to use here.

So let's open up the scheme editor of our project which I can do by clicking the name of our app here and choosing Edit Scheme.

And if I then switch over to the tab called Diagnostics you'll see that we have several runtime diagnostic tools here, like Address Sanitizer which is great at finding buffer overflows.

Let me choose Thread Sanitizer and I'll also select Pause on Issues which means that the debugger will break every time that Sanitizer detects a bug.

Let's run the app in the Simulator with Thread Sanitizer enabled and let's see what happens now if I try to add a new recipe.

If I click the + button now you'll see that he app gets immediately stopped because Thread Sanitizer found the bug.

And notice that I didn't have to try multiple times, Thread Sanitizer reproduces multithreading issues extremely reliably.

So let's look at some details of this bug.

We see that it's a Swift access race.

And if we look at the left side into the debug navigator we get even some more details about this bug.

We see that there's two accesses performed by two different threads, here is thread 2 and thread 4, that are both trying to access the same memory location at the same time, which is not allowed.

So if we take a look at these two lines of code that are forming the race we see that they are both accessing the image cache again.

So since this is a data structure that is shared between multiple threads as we see here it needs to be a thread data structure.

So let's look at how it's implemented, let's jump to storage here which is what we're using and let's see if it actually is thread safe.

So here's the source of image cache, right here at the top of the file and we can immediately spot what's wrong.

This is just a plain Swift dictionary so that's not good.

Swift dictionaries are not thread safe by default.

So if we want to share a mutable Swift dictionary between multiple threads we have to protect it with synchronization, which means we have to make sure that only one thread accesses it at a time.

So now let's actually work on fixing this problem and making the class thread safe and I'll do that in two steps.

First, I'll refactor this code a little bit so that we have more control over storage and then in a second step I will use a dispatch queue to actually make this class thread safe.

So first of all what I don't like here is that storage is declared as a public variable.

That means that potentially any code inside my app can be accessing it.

And it will be really hard to make sure that all the code in my app does it correctly, so let's change it to private instead.

And let me introduce a different way of accessing image cache and I'll do that by introducing a subscript.

That means that users of image cache can use brackets to load and store data from the cache.

So subscripting needs a getter, like this one and a setter.

And for now let's just implement these by directly accessing the underlying storage.

And to make the rest of this file actually build I need to update the users, so instead of accessing the storage property we should be using brackets and index into image cache directly like this.

So if I hit Build Now you'll see that the code will compile fine now, but I didn't really fix any bugs yet.

But I did achieve something, I now have direct control over all the code that can access storage.

It's either going to be code in this getter or the setter, no other code in my app has access to it.

So I'm in a very good position to actually fix this Swift access race.

And let me do that by using a dispatch queue.

So let's create a new private variable called queue and let's assign a new dispatch queue into it.

Dispatch queues are serial by default, so this one is also serial, which means that it will only allow one piece of code to execute inside that queue at a time.

So that's perfect, that's exactly what we need here.

How do we actually execute code inside a dispatch queue?

We can use queue.sync and whatever code I move into queue.sync will be executed inside that serial queue and only one at a time.

And I can return a value here because I need to return something from the getter.

And I can fix the same thing in the setter as well.

And if I move this line of code into this queue.sync it will be executed as part of that dispatch queue.

So this way the code is now thread safe because every single line of code that accesses storage is always executed inside a serial dispatch queue, which means it will only be executed one at a time and is [inaudible] thread safe.

Now it might be tempting to only use the synchronization in the setter where we are modifying the storage and avoiding it in the getter like this.

But that's not correct, that can still cause memory corruptions and still cause crashes.

Let me actually prove that to you by running this version of the code in the Simulator and let's see if the Sanitizer finds this little more subtle bug now.

As I expected it does.

We really have to protect both the getter and the setter with the synchronization.

So let me run my app one last time in the Simulator and you will see it if I try to a new recipe this time.

The controller loads up just fine and we don't get any more warnings because the class is now correctly thread safe.

So I can actually go back to our Organizer window and mark this crash as resolved because we have found, identified, and fixed this bug.

[ Applause ]

All right so what we've just seen is that I've started with a set of crash logs that all have symptoms of a multithreading bug.

And then I've used this tool called Thread Sanitizer to identify and eventually fix this bug.

Thread Sanitizer not only detects multithreading issues, but it also makes them reproduce much more reliably.

Notice that in the demo I didn't have to invoke the controller multiple times.

This tool works on macOS and in the Simulator, but just like all other runtime diagnostic tools it only finds bugs in codes that you exercise by actually running it.

So you should keep that in mind and make sure that your automated or manual testing uses Thread Sanitizer, especially on any code that is using threading or GCD.

And if you'd like to know more I recommend that you watch a video from my WWDC 2016 session called Thread Sanitizer and Static Analysis where we have introduced this tool and we also talked about how it works under the hood.

As a reminder, Thread Sanitizer can be found in the Scheme Editor.

So if you go to Product, Scheme and Edit Scheme that will bring up the Scheme Editor.

And then we can switch over to the Diagnostics tab where you can find Thread Sanitizer among the other runtime diagnostic tools.

I have one more debugging tip that I would like to share with you that's useful when dealing with multithreading.

When you are creating dispatch queues you can provide a custom label in the initializer.

You can assign a custom name to an operation queue and if you're working with threads you can use custom names with threads as well.

These names and labels are displayed in the debugger but they also appear in some types of crash logs and that can help you often narrow down possible causes of multithreading bugs.

Okay so to stay on top of crashes I have three takeaway points for you.

Number one, always test your apps on real devices before submitting them to the App Store, that will help you avoid getting rejected in App Review.

Number two, when you do get crashes from your users you should always try to reproduce them.

Look at the crash logs and look at the stack traces and try to figure out which parts of your app do you need to exercise to trigger a crash or to try to reproduce the crash.

And finally, for crashes that are hard to reproduce I recommend using bug finding tools, such as Address Sanitizer or Thread Sanitizer which work for editing memory corruption bugs and multithreading problems.

So now let's recap what we've learned today.

Chris has showed us how we can and should use the Organizer window in Xcode to get statistics and also details about crash logs.

Greg showed us how to read and analyze the text of crash logs.

In many cases, they can be reproduced like if you are dealing with app lunch timeouts.

Then we've crash log crashes that are hard to reproduce because they happen randomly like memory corruptions.

And we have mentioned what signs they leave in crash logs.

And finally, I've shown how bug finding tools like the Sanitizers can help you reproduce issues that are memory corruptions and threading issues and I recommend that you use these tools as well.

For more information please visit our session's webpage where we will also include links to the technotes that we mentioned and also the other documents that provides debugging tips that are helpful when dealing with crashes.

And I'd like to remind you that there is a crash logs lab starting right after this session at 12 p.m. in Technology Lab 8, so please stop by if you have any questions about crashes and crash logs.

And enjoy the rest of WWDC, thank you very much.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US