Erik Neuenschwander: Hi everyone!
Welcome to Performance Optimization on iPhone OS.
My name is Erik Neuenschwander I manage one of the software performance teams on iOS and heading across the stage is Ben Weintraub who is with me today to do a bunch of demos.
Ben's a Performance Engineer for us.
So, thank you all for coming.
We're really excited to be here and have a great hour ahead for you.
I hope you are here already because you think performance is important or maybe you wandered into the wrong talk in which case take a seat and we're going to spend the next hour trying to convince you that performance is important.
And one of those reasons is that performance is a key aspect of App Store reviews.
If you think about when an application is slow or just not fun to use that's going to drive down your reviews, that drives down sales and we want you to make money and that's why we're here today to tell you how to get superb performance in your app.
Luckily you already have the tools in the form of instruments and Xcode to get good performance and you already have the skills just with your software development background.
And so, what Ben and I are going to do today is focus on some common cases where performance can be an issue and give you some clear strategies to get good performance in those.
So we'll start by kind of the most important thing which is just talking about how to test and how to measure in performance scenarios then we'll spend a lot of time with three key scenarios, namely launches, scrolling, and keeping your memory footprint low.
Lastly, because we know that you have things to do other than performance in your development we're going to talk about how to prioritize performance issues.
So let me start with measuring performance.
Probably the most important thing I can say is when you're dealing with performance issues you need to measure first.
By measuring that will give you an idea of where you can most efficiently put your time in to improve your app's performance.
And measuring doesn't have to be hard.
You can do it just through manual testing of your application.
Use it. Find scenarios you are unhappy with and start working on them.
But new in iOS 4 we also offer automated testing.
And automated testing can give you ways to get repeatable, more repeatable results and kind of more efficiently execute your test cases.
But whether you are collecting your data manually or automatically you have to measure numbers and it may seem kind of daunting because everything affects performance, the CPU, GPU, disk, network latency, it can seem overwhelming.
But you can take a step back and just recognize that trying to guess where your performance issues lie is that's overrated.
You really just need to focus on a scenario that you find to be bad and then measure each of those components in turn looking for a bottleneck.
When you think you've found it, you make a change and then of course you retest to see if you've improved the scenario.
In the end, all you're trying to do is get something which is going to feel right.
So to gather data you have a lot of options.
One of them is just logging.
NSLog is probably a method you are already familiar with.
You can just take a time stamp at the beginning of some activity and then at the end write out how long it took.
You can write that out to CISLOG which you can view through the organizer in Xcode or to a file that you collect in some other way, whatever is going to work for you.
But you also have more sophisticated tools like instruments which we'll spend a lot of time on today and also the simulator which I'm sure you're familiar with it.
But the simulator is maybe not always the most appropriate choice when it comes to performance.
You've used sit for a no doubt prototyping your interfaces and features but you should consider that the simulator uses the Mac hardware, that's the Mac CPU, GPU, disk, the Fast Ethernet, that's going to give you an unrealistic idea of how your application is going to perform on the device.
Now one exception to that is when you're dealing with memory issues.
And there actually the simulator can be good.
It's great for finding memory leaks and looking at your footprint and in fact the desktop offers in some cases more features like Zombie Detection compared to what you what you get on the device.
But what you need to remember is that the device is the final arbiter of performance.
Your customers will be running your application on a device and so that's where you need to be doing your testing.
In fact you should do all of your speed related testing on the device.
And when you fix memory issues that you find in the simulator make sure those are playing out as you expect on the device.
That's really what's important.
So, lastly, I want to say that you should measure early and measure often with the idea of trying to collect numbers when you have a scenario that you like.
When you get a good result you want to record that as a baseline and just have those numbers in your back pocket so that later on when you discover that some scenario has regressed you can go back and look at those numbers and get an idea of where the problem might be.
I talked a minute ago about logging and logging is great but of course doing anything has some performance hit and logging is something that doesn't benefit your customers at all.
So you want to turn off or otherwise remove the logging from the apps that you submit to put up on the App Store.
I talked about testing on the device.
It's the final arbiter so you should really test on every device you are going to support.
And that kind of sounds like I'm telling you to buy one of everything we make which would be great but for a lot of you that's probably unrealistic.
And so at a minimum you want to test on the oldest device that you are planning to support.
That's likely to be the slowest and today for most of you that would be the iPhone 3G.
So now let's hit those key scenarios starting with speedy launches.
Launch is a very important performance scenario.
If you think about it, first of all, when a user buys your application, the icon appears on the screen.
They tap that icon to launch.
This is the hello.
This is the out of the box experience for your application.
So you want the user to have a good experience from the get go.
Or if you think about when somebody who has your app says, "Hey, hey look at this", and they reach out, they show their phone to their friend, they're going to launch your application.
And again that's the first thing a potential customer is going to see.
So even aside from that, launch is a very common scenario.
For non-multitasking devices every time a user switches away to do something else it's going to quit your application.
So, when they tap on that icon again it's going to be a launch scenario.
On devices which support multitasking, instead of launch it's more often a resume, a transition out at the background state but you'll find that in your code there's a lot of shared work between launch and resume scenarios.
So everything Ben and I talk about today will still apply to resume as well.
Lastly, there is a stick.
If your application is too slow then the operating system will actually terminate it.
And that's to keep the system responsiveness up.
If you think about it we don't want a device where the user is reaching out tapping, nothing is happening, it seems hung.
And so if an application is behaving too slowly the OS will actually terminate it.
And we do that with the system service we call Watchdog.
So Watchdog is constantly looking at an application and measuring the "wall clock" time to reach certain checkpoints or dates.
And "wall clock" time if you are not familiar with the term is just seconds ticking by on the clock on the wall.
It's not CPU time or anything fancy.
It's literally just the time that your user is waiting for something to happen.
So these values that you see on screen, they're subject to change, right.
Ideally you want to keep these things as short as possible because all it is, is the user waiting.
But when we're talking about launch your application has up to 20 seconds to be able to return from applicationDidFinishLaunching.
It's quite a long time.
You really want to be far below that.
But that is the upper limit.
For Resume and also for Suspend there's less work to do so that time out goes down to 10 seconds and Quit is actually the shortest time out because well you should already be saving out your state on a regular basis anyway so there should be very little to do when you actually quit the application.
Also new in iOS 4, there's the complete operation, multitasking scenario, and for that say uploading photos to a social networking site or something, you get 10 minutes.
And if you watched the multitasking talks you can also find out more about how to handle and avoid that time out, suspending your background operation gracefully.
So, when you want to collect the data to figure out how close you are to those numbers, figure out how to get it down, you want to make sure that you are testing with a realistic data set.
Your application may launch or resume really, really quick with no data.
So, instead you want to think about a user who's been using your application for 6 months and has a lot of bookmarks or photos or clips or whatever it is that's your application's data set and create a stable realistic data set that you can use for that.
To collect the data you will use the Time Profiler instrument.
Time Profiler works with iOS 4 devices and it collects back traces at regular intervals showing at that instant in time what your application is doing.
You can then look at them n aggregate and get a pretty complete picture of where the execution time of your app is.
And you are really looking for two things primarily.
First if we're talking about launch you want to look for work that you just don't need to do, work that you can take out of launch defer until just slightly later or maybe even on demand, waiting until the user does something that's going to request the work that you're doing currently during launch.
So move that out of the launch path.
But the other work that you'll see is work that you look at that function and you say, I got to call that function as part of launch.
That's necessary work.
And then you want to sort by running time and look for the thing which is taking the most time.
Because that's where there's the most upside for your effort to make your app fast.
So, to show you using the Time Profiler instrument to get a demo application running faster we'll turn you over to Ben.
Ben Weintraub: Alright, thanks Erik.
So in order to show you guys what we're going to be doing with instruments, we've created a sample demo application here.
So let me just show you the app quickly first so you get a feel for what it does.
So, basically you select photos from your photo roll or from the camera.
And then for each of those photos you can create a composition.
So if I select one of these I get this nice Andy Warhol style composition and for each one of these tiles I can adjust the threshold here and then change these colors.
And that's about all the app does.
It's pretty simple.
OK, so now let's take a look at how it launches on the devices.
I have an iPhone 3G is here.
I'm just going to go ahead and launch the app.
Again I'm using a realistic data set with a number of compositions in there to simulate what it would be like if the user had been using your app for a while.
So, you can see that that takes quite a while to launch even on an iPhone 3GS.
So, in order to figure out where all that time is going we're going to switch back to instruments here and we're going to use the Time Profiler instrument that Erik was talking about.
So, if you've used Shark or the CPU sampler instrument in the past Time Profiler is similar in concept but its lower overhead and it's now the preferred way of looking at CPU bound operations.
The first thing I'm going to do is select the app that I want to have instruments launched for me because we're looking at launch times.
I'm going to have Instruments launch the application on my behalf and then it'll start collecting data as soon as it launches.
Let's go ahead and watch.
Alright, so you can see now as Instruments is working there is data being populated into this Call Tree View down here and then it looks like we're finished launching now.
So I'm going to stop the trace.
OK so let me expand this Timeline view out a little bit so you can see what's going on a little better.
Alright, so the first thing we notice is in this Timeline view the purple bars are showing us an approximate amount of CPU utilization over time.
And if I go to the very end here where I stop running the CPU it's about 3.7 seconds after the launch.
So I can use the Call Tree View down here to try and figure out where that time is actually going.
So, the first thing I'm going to do is if you take a look at these check boxes over here there's one check right now that's called Invert Call Tree.
So that see that a little better?
So I actually want to see the methods in the order that they were called.
So, I'm going to uncheck that box and now you see I have Now I have a more reasonable back trace here.
But most of these symbols are not things that I immediately recognize, these are system libraries that end up being called through to get to my code.
So the next thing I'm going to do is check this box that says hide system libraries.
OK, so what that will do is filter the Call Tree's View down to only the stack frames that are from my application itself.
So it looks like I'm spending about 2.8 seconds of CPU time under my root view controller's viewDidLoad method and specifically almost all that time is under this generate composition thumbnails call.
So let's switch over to Xcode and take a look at how that what that code is doing.
Alright so, here's my viewDidLoad at the end.
I'm just calling generate composition thumbnails and all that's doing is iterating over each of the compositions from my data set and generating a thumbnail for each one.
So, and this is necessary in order to show those thumbnails in the table view that we saw before.
So, even though I do want to eventually show those, this isn't something that I want to block the entire launch of the application on.
So, in order to have my app launched a little bit more quickly and be responsive immediately I'm actually going to put this work on to a background thread using a technology that's available now in iOS 4 which is Grand Central Dispatch.
So, the first thing I want to do is get something up on the screen right away as soon as the user launches the application.
So in order to accomplish that I'm going to create a set of placeholder images here and those placeholder images are going to stand in for my thumbnails while I'm generating the thumbnails.
Alright, so that's great.
The next thing I want to do is actually get this call into a background thread.
So, I'm going to do that by wrapping in the call to dispatch_async and so dispatch_async is an API from Grand Central Dispatch.
And I'm going to put it on a low priority background thread because I really don't want it to interfere with the responsiveness of my application.
So I'm just passing on a block here that calls generate composition thumbnails.
Now, in order to make this method work from a background thread there's one other thing I need to change.
So this thumbnails array is now accessed from two different threads and so I need to serialize those accesses in some way and I can do that using a lock or again I could use Grand Central Dispatch.
So that's actually what I'm going to do in this case.
So I'm just going to replace the body of this for loop.
Alright, so now you see that I'm still doing the heavy lifting here which is generating the thumbnail on the background thread but when it comes time to do my updates to the UI thread I called dispatch_get_main_queue in order to send this work inside of this block back over to the main thread.
So that should make sure that there's no synchronization issues or anything like that.
Alright, so now let's switch back over to the device and take a look at the effects of my changes.
So, I have a version of the app with these changes ready to go.
So let me just launch it.
Alright, so that was much faster.
So the final thing we should do though is make sure that we can actually quantify that change in Instruments.
So, I'm going to select the modified version of my app.
And again have Instruments launch it for me.
[ Pause ]
Alright, so you can see that we're still spinning the CPU for quite a bit of time here and that's expected actually because we still need to do that work of generating the thumbnails but the difference is that now it's on a background thread and we can see that easily in instruments if we check this separated by thread check box here.
So, I'm going to go ahead and do that.
OK, so now if we look at our main thread and expand that out.
We can see that viewDidLoad is only taking about 185 millisecond of CPU time and all of the real heavy work is happening on this background thread that we created using Grand Central Dispatch.
Alright, so that's an example of how you can use Time Profiler to help speed up the launch of your application.
So, back to you Erik.
[ Applause ]
Erik Neuenschwander: Thanks Ben.
So, to get your launches going fast you first have to remember that the system Watchdog is out there but you really want to be well below those levels I was talking about a minute ago.
What you can do is collect a trace using the Time Profiler instrument like you saw Ben do and in his case he did less work by deferring the work out of startup and that's one way that you'll commonly be able to solve that problem.
But sometimes there's work that you have to do and that operation maybe slow and you want to make sure that in one way should they perform you never block on those slow operations.
In particular, never do networking on your main thread.
If you think about it you don't control how quickly that server is going to respond so this is a prime candidate for moving it off to some non blocking thread.
But if you're doing a lot of work you need to optimize those time consuming activities and make sure I talked about a realistic data set but you should think about what realistic is.
If you're data set is going to keep growing endlessly over time eventually it will get slow.
So think about the data set that you are looking at on launch and think of someway to make sure that that size always remains constrained.
And then like you saw Ben do, collect a new trace after you've made a change and quantify your results.
So that's speedy launches.
Let's talk about scrolling next.
And scrolling is another really important scenario.
I mean how many of you have used UITableView, right?
It's a very popular class.
And this is because you want to show large amounts of data and both on the iPhone, iPad, iPod Touch.
It's a great way to do it.
But because we have this direct manipulation UI when a user reaches out and wants to scroll through that you want that to seem like they're actually manipulating those cells.
And if it stutters that will break that kind of seamlessness that you want in your application and that's going to create a bad scenario for the user.
So the way that we measure if scrolling is going well or not is frames per second.
Which is abbreviated as FPS and we pronounce that fips.
And if you're looking for a good FPS number the magic number is 60.
60 FPS is completely smooth.
You can use the Core Animation Instrument to collect that FPS data.
It does measurement of FPS in real time.
And if you have any animation which goes on for longer than a second it's very, very easy.
You just look at the number that is presented to you and that's your FPS count.
The Core Animation Instrument can also work for subsecond animations but then you have to do a little bit of math.
If you think about a 0.3 second animation which only draws 18 frames well then Core Animation is going to report 18 FPS.
That's all that happened during that second.
But if you think to yourself well, alright 18 times 10 divided by 3 aha!
that's 60 FPS.
So, you can kind of go through that but if you have an opportunity to make your animations longer at least for testing it will save you that math.
In addition to doing FPS measurement Core Animation also has a set of check boxes that can show you visual cues about how rendering is happening.
And in particular one of those is Color Blended Layers.
And we'll show that today.
But there are many others and there have been some great talks on instrument specifically and I'd refer you to those to learn more in depth about the Core Animation instrument.
But to show you collecting some FPS data with that same demo app.
I'll send it back over to Ben.
Ben Weintraub: So, here's our application and so if we scroll through these we can see that the scrolling is not too great.
It's pretty chunky and it can certainly be improved upon.
Alright, so going back to instruments now, we're going to use the Core Animation Instrument as Erik mentioned to sample the frames per second that we're getting out of this application.
So I'm going to just select that application as our target and actually with the Core Animation because we already have the app running we don't need to have Instruments launch it for us so we can use this attach to process feature and I'll just start recording.
Alright, and now all I'm going to do is just scroll down through that list that you saw previously.
Scroll to the bottom and scroll back up to the top.
Maybe do a little bit more so we get some more data here.
So let's stop the trace now.
Alright, so if you look over this frames per second column here you can see that the numbers we're getting for FPS are not that great, when we're scrolling here.
We're certainly nowhere near the 60 frames per second that Erik was talking about as the should be your goal.
So in order to diagnose why that's happening you have a number of different tools available to you but one of the most common problems with scrolling performance is if you're creating a lot of objects and then throwing those objects away immediately.
So, we can use the Allocations Instruments in order to diagnose a problem like this.
So, I'm going to start a new trace with the Allocations Instrument and again select my app and then I'm just going to start recording.
So, the allocations instrument is going to collect back traces of every allocation that happens inside of my application.
So that's anytime that you allocinit a new object or when you call malloc directly or calloc or any of those other functions.
And then it's going to aggregate all that information together in this nice statistics view for us.
So, what I'm going to do now is scroll through a bunch of the table view cells that I've got here just scroll to the bottom of my table view and you'll notice when you do this on your device is if you try it out that the performance of your application will be somewhat degraded while you do this and that's OK.
It's actually to be expected because of the amount of data that the Allocations Instrument is collecting.
Like I said it's getting back traces from every allocation inside of your app so that's quite a bit of data.
Alright, so I'm going to just stop the trace now.
OK. So the first thing that I want to do is restrict the portion of the timeline that I'm looking at to only show the area where I was scrolling.
So, if you recall we waited until this big spike here sort of went away.
So, I'm going to move the cursor to about the point where I think we started scrolling and now I'm going to use the inspection range tool to have Instruments only look at this portion of the timeline that's highlighted in blue.
Alright, so now if I look in the statistics view down here each one of these lines represents one category or one type of object that I maybe created in my application and Instruments is giving me some statistics about each type of object that I created.
So in this case what I'm looking for are objects that I create and then throw away immediately.
So those, we call transitory objects because they're they have a short lifetime.
So, I'm going to sort by the number of transitory objects here.
And if I look up at the top I see a bunch of malloc allocations.
I see some pretty generic looking things CALayer, CF basic cache.
None of these really means a whole lot to me to begin with.
So what I'm actually going to do is use the search functionality instrument to search for a class from my application.
So because my application is called compositions I have a class in here called composition table view cell.
And if I take a look at this particular class I see that I have 93 transitory instances of that class.
So, what that probably means is that every time I need to bring a new table view cell onto the screen I'm creating a new one just for that purpose.
So if I actually get rid of this search and then look at the objects adjacent to that composition table view cell class I see that I have 93 of a bunch of different objects that look like they might be associated like UIButton, UIView, UITableViewLabel for instance.
So these things are probably being created along with my UITableView cells.
So, Erik's going to talk about an API that we have that will actually help you avoid this problem.
Erik Neuenschwander: So I'm going to show you actually visually what Ben will show you in code in just a minute which is making use of what we call cell reuse.
So, the way the application is behaving now and you see that with all the transitory cells getting created is that we create them, they scroll onto the screen, they scroll off the screen exactly as you'd expect and then they get deleted.
And then they have to get recreated, they come back on the screen and so we want to avoid doing all those allocations because that's contending with the scrolling and giving the poor FPS that you're seeing.
So if you use cell reuse which is an API that Ben will show shortly.
These cells still have to get created.
There's no avoiding that.
They scroll on, they scroll off but then they get recycled, because it's just more of the same that's going on to the device.
So after you've created that initial set of cells you actually have a steady state as they come on to and off of the screen.
So that's a little graphical showing of that.
I'm going to send you back to Ben to show you both how to do cell reuse and then some other tricks to get us some good FPS numbers.
Ben Weintraub: Alright, thanks Erik.
So, if we take a look at our table view cell per row at index path method here.
We can see that right now we're just every time we call it we're creating a new Autoreleased composition table view cell that's a subclass of UITableView cell that's custom to our app.
And so instead of doing that we're going to go ahead and use the API that Erik mentioned.
So, let me show you how that looks.
Alright, so there's a couple of things here.
The first thing to note is this reuse identifier.
So this is just an arbitrary string that you can give whatever value you want but the idea is if you have multiple types of table view cells in your table view with different lay outs for instance then you can uniquely identify those different types using reuse identifier.
So the next step is before we create a new table view cell we're going to call this dequeueReusableCellWithIdentifier method on UITableView and that's going to ask the table view whether it has any reusable cells for us to use rather than having to create a new one.
And then if that fails we'll go ahead and create a new table view cell which is OK because we do need to have as many cells as are visible on screen at any given point in time.
OK, so let's take a look at how that looks on the device now.
I have a version of my app with this change.
OK, so as you can see the scrolling is a little bit better but still not as good as it could be probably.
So, in order to figure out why that is, we're going to use one of the other features of the Core Animation Instrument and that's this check box over here that's labeled as Color Blended Layers.
So it's probably easier if I just show you what this looks like on the device.
I'm going to check this check box and then immediately you'll see what the results look like on the device.
OK, so now you can see we have a number of views here that are colored red and some that are colored green.
So, the green views are good in this case, the red views are bad.
The green views mean that those are views that are opaque and red ones are not opaque.
And what those non opaque views or layers mean is that the graphics hardware actually has to do more work in order blend them with the views that are behind them.
So, this Color Blended Layers check box can be really useful in identifying that.
Alright, so let's switch back over to Xcode and see if we can fix those two instances of non opaque views that we had in our table view cells.
So the first one of them was this thumbnail, so the thumbnails I know are generated inside of this class and specifically in my thumbnail with size method here.
So, in this case in order to generate those thumbnails with the cropping I'm calling UIGraphicsBeginImageContext in order to start a new image context stuff and by default this will give me back an image context that has an alpha channel which means it will be non opaque.
So in this case I actually don't need that.
And so, I'm going to go ahead and call a different variant of this method.
So, UIGraphicsBeginImageContextWithOptions allows me to pass in a flag here that's this yes parameter that will specify that I want an opaque image context.
And that means that the UIImage that I returned from here will also be opaque.
OK, so that should fix the thumbnails.
And let's take a look at those date labels you probably noticed were also not opaque and so that's in my composition table view cell class.
Alright, so here in the initializer I'm just manually setting them to be non opaque and setting the background colors to nil.
So this may seem a little bit silly but this can actually happen and does happen relatively frequently when you are playing around with different layouts and maybe you wanted them to be nonopaque for one particular layout and then you forgot to switch them back.
Another way it sometimes happens is people think that they need to set the UILabel instances to be non opaque in order to get that nice blue highlight color to show through when you select the table view cell.
And that's not actually the case.
UIKit will handle that for you automatically so you don't need to worry about that.
So I'm going to get rid of these two lines because there's no reason for those labels to be opaque.
So let's go back over to Instruments now.
And again, I'm going to run a version of my app with these changes and we're going to check the Color Blended Layers check box and see how it looks.
So, let's switch over to the device.
Alright. So, here's a versions of the app after we made those transparency related changes.
And I'll just check Color Blended Layers, and now you can see that those thumbnails and the date labels are both green.
So that's great.
So, now that we've made all of these changes we want to try and quantify what the impact to that was.
So, if we go back to instruments we can use the Core Animation Instrument again to measure the frame per second that we're getting with our modified app.
So in order to do that I need to select the running copy of the app and just hit Record and then all I'm going to do is scroll down to the bottom of this Table View.
And then back up to top again.
[ Pause ]
Alright. So, you notice that we still have a little bit of room for improvement but we're getting into the 50s now in our frames per second.
So that's certainly a great improvement over where we we're previously.
Alright. So, back to you, Erik.
Erik Neuenschwander: Thanks, Ben.
That's actually the best behaved that application spend with the FPS.
It's actually a live demo there.
And Ben practiced very well to get a scroll that gives us some good numbers.
So, you see we didn't quiet reach 60 there but hopefully you could see the visual improvement when Ben went back to the application.
And also we can see quantitatively that we went to the 20s and 30s up to more in the 50s and so that's a big improvement.
So, when you're thinking about scrolling you need to test scrolling scenarios and you can to that with manual testing or using automated testing with flip gestures to get scrolling through a data sets that's going to give a good scenario.
When you're scrolling, you want to launch the Core Animation Instrument and use it to measure FPS.
And remember 60 FPS is kind of the gold standard that's what you shooting for.
Ben made a couple of changes there, first of all, we have that API to reuse cells and so you can do that using the UITableView method and it with style reuse identifier you just want to pass in a non-nil reuse identifier something that identifies the kind of cell that that is.
And then, instead of just unconditionally allocing a cell you want to use the UITableView method dequeueReusableCellWithIndentifier passing that same identifier and if you've already created and stopped using a cell in the past you'll get back that instance and be able to avoid the allocation.
Such one key thing that you should be doing pretty much constantly whenever you have the same kind of cell that's going past in the list.
But the other thing is to use that Color Blended Layers and if I can make you sit through a rather bad rhyme you want the screen to be green.
Right? You're trying to get as much as green you can on the device because that's when the device is doing as little work as possible.
And that's true even for you UILabel like Ben said just through testing you can sometimes turn off opacity.
Set Opaque is Yes by default.
You should leave it that way whatever you can.
And in fact even for you UILabel the system performs some magic on your behalf, that even if you have opaque labels when you select in there's that blue background it will still kind of bleed through the label even without you setting it to be transparent.
So, there's no need to do that and I hope you keep the screen green and keep your FPS up.
So that's Smooth Scrolling.
So, let me move on to talked about memory footprint.
And keeping your memory footprint low is also important because iOS has no swap.
So you maybe familiar from the desktop that when enough memory is needed that it exhaust physical, it will go out to the disk and that slows things down somewhat but there is at least that escape valve.
And on these devices, we don't have that.
So that means that we can have memory pressure meaning that we're just running out of any free memory available on the system.
And so again, to preserve system stability the OS will step in and it will terminate applications when the device gets under high memory pressure.
The service that does that termination is called Jetsam.
Jetsam is constantly watching memory pressure and it provides instant lightweight termination of applications when memory pressure gets too high.
This becomes even more important in multitasking scenarios.
You think about multitasking we do have more applications that are present in memory so that in general is going to cause more memory pressure on the device.
So there are some capabilities to preserve applications with smaller Footprints longer to keep more of them running.
So that's a little bit of care to keep your footprint low because on especially on multitasking devices that will help your apps stay around longer.
The general reason to keep your memory footprint low is that it is a shared resource and so you really want to use as little as possible because that will give the over all best experience for the user.
If you want to think of it just really tersely it's that you can stay safe from Jetsam if your stay low it terms of your memory usage.
There are three areas that we'd like to suggest you look at as ways to keep your memory usage low.
And the first to talk about is avoidable spikes and then secondly we'll go into leaks.
Leaks is probably the one your most familiar with and that third term will talk about in some detail because it might be new to you.
And that's abandoned memory.
But let me start off talking about those avoidable spikes.
And this is just a bunch of individual, maybe they are small but very brief allocations which are all present simultaneously.
So, you get a spike and if that spike causes memory pressure then even though in the future a millisecond later you might have gotten rid of all of it, the OS can't know that.
And so if memory pressure gets too high you'll be terminated.
So, you want to avoid that and two cases where it's likely to come up for you.
Is first if your processing large quantities of data.
One example that might come into mind is a video playback but there you get to use the API in the OS which manages to play the whole video without actually causing a lot memory use.
But if you're ever processing large quantities of data in some other way in some other way downloading a big XML document or something like that.
You want to try to approach it as small individual batches that you can work on in pieces to keep your over all memory footprint low.
Another case for memory pressure can come up for you is when you are using a lot of Autorelease objects that causes object lifetime to grows somewhat and so the key there is to find a way to reduce object lifetimes.
Let me going to Autorelease in a little bit more detail.
And so for some of you, you may think of Autorelease as just a way to avoid retain/release.
You call Autorelease magic happens and you don't have to think about it anymore.
But I'd like to kind of pitch it you in a little bit of a different way which is to think of about Autorelease as a way to return objects without retaining them.
That way you leave it up to your caller to retain the object only if necessary.
So, when you actually call Autorelease what happens is that instance gets added to the NSAutoreleasePool and that Pool then is going to keep a whole list of these objects and maintain them or maintain references to them so they can call release but that release call happens at the next turn on the runloop.
Of course when they call release as you probably know if the retain count drops to zero the object will deallocated then but it's in that instant in between when you call Autorelease in the turn of the runloop during that time you can have a bunch of objects whose retain count is 1.
And they will be deallocated when the AutoreleasePool gets around to it but in the meantime when memory can actually spike as you have all these objects that are soon to be deallocated.
And so, Autorelease is the common cause of memory spikes.
So, to show you an instrument that can help you identify this and ways to get around it I'll turn over to Ben and the Allocations Instrument.
Ben Weintraub: Alright.
So, you may have noticed previously when we launch our application under the Allocations Instrument that we had this big spike in memory right when we started up.
So, we're going to try and investigate what's going on there and see if we can fix it.
So again, we are using the Allocations Instrument, we're going to launch the right version of it and again, what Instruments is doing now is just collecting back traces for every allocation that happens inside of this app.
And this Timeline view is going to show us graphical representation of the amount of memory that was used over time.
OK. So, we have this big spike here and then it drops back down as you just saw it do, so let me stop this trace and make this a little bigger so it's easier to see.
OK. So, you can see that over the first couple of seconds of our application's lifetime our memory usage is just growing and growing and then we have this big drop off here.
So, if we want to figure out where all these allocations are coming from again I'm going to use the inspection range tool in order to only look at a portion of the timeline.
So, I've moved my cursor to right near the end of the spike.
Oops, I want the other one actually.
And now I'm selecting just the portion of the timeline that involves that memory spike.
And then, in this case I actually want to look at this Call Trees View.
So, the Call Trees View will show me the call trees under which all these allocations took place.
So, by default, as you'll see the separate by category box is checked here which means that these Call Trees are sorted or bucketed by what type of allocation they where.
I'm just going to turn that off because I want to see them all aggregated together.
And again I'm going to check hide system libraries so I just see my application's code.
Alright. So, now if I expand the heaviest path here, it looks like almost all these allocations are again coming from my thumbnail OS size method.
So, let's switch over to Xcode and take a look at that method and see if we can improve upon it all.
OK. So, looks like we lost our changes from the previous time with dispatch and everything but that's OK.
We can still show what we need to do in order to get around that memory usage problem.
So, what I'm going to do for each iteration of this for loop when I generate these thumbnails, that's causing a bunch of Autoreleased UIImages to be created and then those UIImages as Erik are all present simultaneously and they don't get released until the Autorelease pool for this thread is popped.
So, what I'm going to do in order to fix this is go ahead and wrap each iteration of this for loop in its own Autorelease pool.
So at the top of the for loop, I'm just going to alloc init a new Autorelease pool and then down at the bottom, I'm just going to call drain on it.
So, one thing that's important to note about this API is that when you call drain, that also releases the NSAutoreleasePool.
It's a common misconception that you need to call drain and then call release.
You don't actually need to do that.
You just call drain and then it'll release for you.
Alright, so now, we have a modified version of this app on the device.
So, let's go back over to instruments and take a look at how it compares with the original here.
Alright. So, I'll just press Record and start a new trace here.
So, you can see that our memory usage is still growing somewhat when we launched but that's to be expected.
I mean we do need some memory to do our work and it looks like we don't have the same kind of long-term growth that you saw previously.
So, let me stop this trace and one of the nice things about instruments is that you can take multiple runs of the same operation and then compare them side by side.
So, by expanding this disclosure triangle over here, I can now see my previous run in comparison to the current run and I can see that I've reduced my peak memory usage substantially.
So previously, we're up at about 2.8 megabytes and now it looks like our peak memory usage doesn't get more than 1.4 megabytes or so.
So, that's an example of how you can use the allocations instrument to diagnose and help fix memory-related spikes in your application.
So back to you, Erik.
Erik Neuenschwander: Thanks, Ben.
You can really see when you compare between those two runs a big difference between the memory sections so that's a great way, probably one of the clearest ways that you can see your memory usage change.
So, to talk, to kind of wrap up Autorelease, you should use Autorelease.
It's a feature but it is a little bit more expensive than your typical retain-release and so, you only want to use Autorelease when it's appropriate and there are really two cases you need to consider for that, your code and API usage.
In your code, you'd like to use Autorelease at framework boundaries with inside your project.
Basically, if you're ever handing back an object to some other object, that's going to maintain the lifetime of your return value on its own.
If you're ever having some say, member of a class and you're going to maintain that.
You're going to allocate, say in your init, release it in your dealloc.
This is something where you control the entire lifetime of that object and so, you can just use retain and release to manage it that way and get a little bit more efficiency and some very good control over how long that object will be around.
But the other cases when you're using API like in Ben's example there.
API will return to you an Autoreleased object and well, there's really nothing you can do about that.
They did it because they wanted you to be able to retain it if necessary but in that case, as Ben showed you, you can use nested Autorelease pools to get some control over that and really shorten down the lifetime so that we were still seeing those individual spikes as the thumbnails loaded but overall, there wasn't that same sort of triangular memory growth, right?
And that'll help keep your overall maximum memory lower and that'll keep you avoiding jetsam and keep your app around.
So, let's talk about that second area to keep your memory usage low and leaks is probably something you're familiar with.
In fact, if you attended the advanced memory the advanced instruments talked that focused on memory performance.
They gave a really great demo of the leaks instruments.
So, we're not going to do that here but I'll just kind of give you the quick summary which is that well, leaks are just memory that you can't get at anymore.
You have no references to it and so, we have an instrument for that.
It's the leaks instrument and if you launch your application with it.
It's able to give you all the points at which that memory was allocated.
So, the point where the memory is allocated is never really going to be your bug.
After all, you probably allocated that object for some reason but it does give you context to understand where your problem is likely to happen and you can dig in and actually look at the individual retains and releases and the code where that happens to try to understand where things went awry.
There are two common ways in which people end up with memory leaks.
The first of which is just an unbalanced retain-release.
Typically, well more retains than releases.
But there's actually a subset of that which in the new Objective-C 2.0 run time you can hit which is when you're using properties and that's if you forget to release the value that that property had before retaining the new value that came in.
So a little trick that you need to make sure that somebody isn't setting the property to the same value and again, that other talk which focuses on the leaks instrument can show you very clearly in code how to do that.
But using the leaks instrument, you can quickly get on top of memory leaks in your app and use that again to keep your memory usage well.
So let me talk about that third one which is abandoned memory and this may be a new term to you but abandoned memory is almost like a leak but not quite.
It's a memory which you still have an act of reference too.
So you can still access it but at this point, it's left over.
It's a memory that you are actually never going to choose to access again, and so therefore, you might as well release it, free it up and get it out of your application's memory space.
So the allocations instrument is the right tool for the job here and it offers an additional feature called Heapshot.
Using Heapshot, you can take a snapshot of while you're heap and then run through a set of operations and take a second snapshot and then compare the heap between those two operations.
And what you're looking for are differences that you don't expect.
You don't really expect that that object was still hanging around in that second Heapshot and you can then figure out how to free it and keep your memory usage low.
So I'm going to send it back to Ben one more time.
We have one more little problem in our demo application and he'll show you how to use Heapshot in the allocations instrument.
Ben Weintraub: Alright, great.
So, we're going to use the allocations instrument again as Erik mentioned and I'm going to select my application and just go ahead and launch it.
So, the best way to go about finding abandoned memory problems is to choose some common user scenario in your application and then to run that scenario once.
Mark the heap using this Mark Heap button here.
And then run the scenario again and mark the heap a second time.
And then, instruments will allow you to look at the deltas between those two operations.
So, let's switch over to the device for a second and I'm just going to show you the operation that I've chosen to do here.
So it's a pretty common one in my app.
I'm just going to select one of these compositions, select the tile, cycle through a few colors here and then go back to the main table view.
So as a user, I wouldn't expect there to be any increased memory usage from just doing that operation once.
So after I've done that once, let's switch back over to instruments and I'm going to mark the heap.
And then I'm just going to do that one more time quickly.
Alright, so I'm just selecting a tile, cycling through these colors, and now I'm back to my table view.
Alright, so now, I'm going to mark the heap a second time.
Alright, so let's stop this trace and make it a little bigger again.
So, you can see these red flags in the timeline here and those represent the points in the timeline when I took those two Heapshots and the first one is labeled as baseline by instruments.
So if I expand that out then each of these lines corresponds again to some type of object in my application's memory space and I could see how many of them were live and how much memory they're taking up.
So that I'm not particularly interested in at the moment because what I really want to see is the delta between these two points in time.
So in order to see that, I want to look at this Heapshot 1 snapshot here so I'm going to click the arrow next to it in order to focus on it.
So now, all of the objects that are listed here are actually objects that were added at some point between, or they were allocated at some point between these two red flags on the timeline and are still alive at the point when I took the second snapshot.
So, it looks like the biggest by far category of allocations I have here under the non-object category.
So I'm going to expand that out and I'm just going to sort of scan down this heap growth column until I see something that's big and then there's a 40-kilobyte allocation out of a total of about 255 kilobytes of heap growth between those two flags and so that certainly stands out to my eye.
So I'm going to select that allocation and then click on the expanded detail view here.
And what that will do is bring in the backtrace under which this object was allocated.
So if you take a look at this backtrace, you see that we're calling malloc from this tile backing initWithImage method.
So tile backing objects in my application are actually grayscale bitmaps that are used to generate those threshold and colorized tiles that you see in the app.
And I know that I'm doing some caching of those objects in my imageForTile with size method.
So let's head over to Xcode and take a look at the cache strategy that I have.
OK, so here's my imageForTile with size method and this is actually a reasonable place the cache and the reason for that is when I adjust the threshold as you saw in the very beginning of the demo.
I don't want to have to regenerating the grayscale version of these tiles every time.
So there's some reason to have this cache.
And so what I'm going is I'm just generating these strings that are cache identifiers and I'm using those as the keys in my cache.
I have just a mutable dictionary here.
And then the values are the tile backing objects themselves which can be quite large because they're again bitmaps.
And so in this case, it looks like what I'm the way I'm generating those cache keys is by taking the three color components, the RG&B components for the tile color that I'm generating along with the width and height of the tile that I'm generating.
But as I mentioned before, these are actually grayscale bitmaps.
So the tile backing object for a red tile is going to be exactly the same as the tile backing object for a blue tile and there's no real reason for me to have to cache them separately.
What I end up doing in this cache is just caching the same thing multiple times under different names and this is actually a common problem with caches.
So I'm going to go ahead and get ride of these portions of the cache identifier and change my format strings so I no longer need those.
Alright, so that should help somewhat.
The other thing to keep in mind though is that I'm caching in one of my model objects and these model objects stay around for the lifetime of my application in my case.
And so there's actually no reason to be caching these tile backings once the user has gone back to that main table view screen in my app.
They can just be regenerated the next time that the user selects a different composition.
So, the next thing I want to do is actually add a method that will allow me to flush out this cache entirely.
So, that's pretty simple.
I just added a flush caches method.
All it does is to release my cache and set it nil and that I need to add that to the header file as well.
Alright, and then the final thing I need to do is actually call this method from my view controller.
So in my compose view controller which is what the view controller that handles the screen with all the tiles on it.
I have a view will disappear method and at the end of that method, I'm just going to go ahead and call flush caches on the current composition and that will make sure that those tile backing objects don't hang around for too long.
Alright, so we're running a little bit short on time.
So I have a saved version of this trace that I'm just going ahead to show you.
Alright. So this is I performed the same operation twice and again, we can see this nice side by side view with instruments.
So here's the below, you can see the original version and you can see there are memories growing constantly overtime and above, you can see the version with the changes that we just made and so you can actually tell that you can just see visually that we're not growing our memory usage overtime as we were in the previous version.
And then the other thing, the other way that quantified this is so if we look at our original trace, we see that the heap growth, for Heapshot is listed as about 430k.
If we look at our second trace, we can see that the heap growth is listed at about 13k.
So that's a pretty significant improvement.
So that's an example of how you can use the new Heapshot feature that's new in iOS 4 to help diagnose abandoned memory problems.
So back to you Erik.
Erik Neuenschwander: Alright, thanks, Ben.
We have a lot of demos there, so I appreciate you guys not applauding but that was Ben's last run so let's give him hand.
[ Applause ]
Alright, we'll just finish up with a couple of comments on that and prioritizing.
So, when we're talking about keeping your memory usage low, I want to remind you about jetsam being out there.
It will kill your application if memory pressure gets too high.
So you can focus on three areas: leaks, abandonments, and spikes.
We talked about all three of those and showed you instruments that you can use to get on top of that.
The leaks instrument is great for leaks and the allocations instruments is all around useful including that Heapshot feature to show you as Ben showed you overactive caches or other ways in which you're abandoning memory which you can reclaim.
A way that you can often end up with spikes is through Autorelease.
So you want to target, limit your use of Autorelease but if you're using an API which is a heavy user of Autorelease objects then you can have nested Autorelease pools and that works out great.
So, we know that you have things to do other than just work on performance in your applications so let's talk a little bit about how to prioritize performance issues relative to the features and bugs and everything else that you have to do.
And so at least on the iOS team, we believe more or less that there can be show stopping performance issues.
Ones where performance is so bad, we won't even ship the release.
And part of how we do that is by establishing goals early on in the release and getting consensus around that so that everybody knows what we're trying to head for.
And so that's certainly something you can do within your development team to get agreement about what performance issues you've really got to fix.
But to prioritize them, you really want to look at it in two dimensions.
First is just the frequency at which the performance problem comes up.
May be it's a scenario which is very common, for instance launch or scrolling through your main list of composition say, in our demo application.
And so if it's a common scenario that means it's going to hit your user pretty frequently.
You also want to think about not just the scenario but how often it performs poorly.
Maybe it's just slow every so often and that's going to be less severe than if it's slow every time.
And the second dimension, you want to consider is the severity.
If the application is unresponsive for several seconds, well, that's going to be pretty bad or may be it's single-digit FIPS in an animation and we're here to tell you that that looks awful.
Or it could be that one of those things they talk about, the watchdog or the jetsam is happening to your application and that's bad because to a user, those terminations look exactly like crashes.
Your user can't tell.
And so it may be that if you're getting feedback from your customers that your application crashes a lot and you can't figure out what's going on, it may be that your top crash isn't a crash at all.
It's either one of these watchdogs or these jetsams.
We have one more tool that will work out well for you there and that's iTunes Connect.
You already know about iTunes Connect, it's how you get your applications on the store in the first place.
But I hope you've also noticed that we offer third party crash reports and that crash report gives you, of course, traditional crashes but down at the bottom, there's also this bar that shows you the frequency of different kinds of diagnostic events.
Now, this is picked from just one internal Apple app on one of the QA servers and you can see in this case, crashes are the dominant feature of this application.
93% of the time it's in actual crash but we have 7% where a watchdog is happening and then at least in this case, we have a very few or no jetsam events.
Your bar is probably going to look different and if that blue bar is small then you want to dig in to the other two areas.
So for watchdogs, also on that same report page, there's a breakdown of the different kinds of watchdog events that are generated, say launched or quit and there's that button to the right, it says Download Report.
If you click that, you will get reports from your users out in the field and it will have the backtrace of your application.
You'll be able to see what it was so busy doing that the application had to come in or that the operating system had to come in and watchdog it.
So that can be very helpful for understanding how your application is getting watchdog of what's going on.
That other class is the jetsam events and there we offer two pieces of data for you to look at.
One of which is the average memory usage of your application at the time it was jettisoned.
So if you see that number and it looks a lot higher than the numbers that you're seeing in your internal testing, that means that maybe you're not using a realistic dataset for your users or that you have some database that's growing without bound and taking up a lot of memory in your application.
So you want to kind of think about the test scenarios that you're doing.
That second number, the largest, gives you an idea of maybe there's some overall memory leak that you just haven't caught in your application.
So you can use this if you see a lot of green in that bar to decide to go investigate the memory usage of your application and as far as your users are concerned, this is the same thing as fixing a crash.
So it's really worth doing.
Let's wrap up and I hope that you either came in believing or you'll leave a believer that performance is critical for your application.
We showed you a lot of different instruments and ways to use them that you can measure and get data to improve performance in your app and I want to remind you one last time that performance testing really needs to be done on the device and ideally, you should do it on the oldest device you're going to support.
Three key areas we talked about, launched, very important, scrolling first constantly and memory usage.
Like I said, your application will get terminated.
So, if you develop clear performance goals then you won't fight at the end of your development cycle about what to fix and please visit iTunes Connect for performance-related reports.
Let me point out three related sessions, we have one later today and also one tomorrow morning that are advanced going into much greater depth about performance issues.
Also, if you're a heavy core data user, you can optimize it.
You can find out about that at the talk and there are three other ones, there are instrument talks which have already happened sadly so those are on video.
There's more information with these evangelists, I hope you know, documentation on the website, a link you can click.