Using Time Profiler in Instruments

Session 418 WWDC 2016

Learn how to make your apps faster and more efficient in this introduction to Time Profiler in Instruments. Walk through how to use Time Profiler to measure your app's performance. Learn how Time Profiler works and can be used to identify problems and verify your fixes. Discover how easy it is to improve your app's power usage and performance by using Instruments throughout your development process.

[ Music ]

[ Applause ]

Good afternoon.

My name is Kris Markel.

I'm a member of Apple's Performance Tools Team.

And today we're going to talk about using Time Profiler in Instruments.

So here's our agenda.

We're going to start with an introduction to profiling.

What it is, and how you get started.

And then we're going to show an example of making an application go faster and do less.

And then finally, ensuring we have really great user responses in our application.

But before we do that, let's talk about why we want to do this.

And the reason we're doing this is because we want to provide a great user experience.

We want our users to love using our applications.

We want them to love our user applications because they're fast.

Cheetah fast.

So it saves them time and they can get more done.

We want to make sure our users have all-day battery life.

So that they're able to use their devices throughout the day.

And you don't want to be that application that's at the top of the battery usage chart.

And finally, we want to make sure we have smooth scrolling, and a really responsive UI in our applications, so users love using it.

And how are we going to do this?

We're going to use a technique called profiling.

Profiling is answering two questions about your application.

How much work is it doing?

And what kind of work is it doing?

And, if you hadn't guessed from the title, we're going to use a tool called Time Profiler to do this.

Time Profiler answers an additional question of when is my app doing that work?

Time Profiler is part of an application called Instruments that is a collection of a wide variety of tools that measure all kinds of aspects of your application.

And how do we get started?

Well, it's really easy.

From Xcode's Product menu, you just go ahead and choose Profile.

So let's go ahead and do that now.

Okay. So first I'm going to show you my application.

Give you an idea of what it's doing.

My application uses the Photo's framework to look at the user's photo library and check for duplicates.

So I'm going to go ahead and do that now.

And it takes a little while when my test data has lots of duplicates.

So it comes up with a pretty big number.

Okay, that's done.

Now I'm going to go ahead and select some photos.

And then let's find a good one.

Here is one.

And then it uses the core image face detection.

So I can add some googly eyes to my face.

And then it uses UI dynamics in CoreMotion to make the googly eyes jump around when you shake the device.

Okay. So that's the application.

Let's go ahead and profile it to see what it's doing.

So from Xcode's Product menu, I'm going to go ahead, and just like I said, choose Profile.

This is going to do a release build of my application, and then launch Instruments so I can start measuring it.

It's important to do profile release builds, because you want all the optimizations that you'll get from the compiler.

So this will take just a moment.

And it's copying the application to the device.

Here we go.

So now Instruments comes up.

And it's going to show me the template chooser.

And this you get an idea of exactly what kind of tools Instruments provides for you.

Today, of course, we want Time Profiler.

So I'm going to go ahead and click Choose.

Now if you've never used Instruments before, this user interface can be a little intimidating.

But getting started is really simple.

All I have to do is go up to this Record button up in the corner, and click it.

That's going to start my application, and start measuring it.

So let's go ahead and do that now.

Okay, so now my application is started.

It is starting.

There we go.

Right? I'm going to do my duplicate checking.

Just one moment.

Okay. And then I'm going to go ahead and do my duplicate checking.

And let's see if we start getting data now.

There we go.

Okay, now that's done.

Now I'm going to select my photos.

And then I'm going to pick a photo.

Add some googly eyes.

Shake it around.

Now let's go ahead and dismiss that photo.

And then I'm going to go ahead and stop the recording.

I'm clicking the Stop button.

And now we can see here, I have a nice, high-level overview of what my app was doing.

Time Profiler gives me two graphs.

This top part of Instruments is known as the Timeline view.

The first graph is my CPU usage.

How much CPU my app is using.

And I can actually hover over it, and see the precise values.

The second graph is what lifecycle state my application is in, and how long it's been in that state.

So that gives me a really good idea of how much work my application is doing, and when it's doing it.

But how do I know what is my application doing?

And to figure that out, we go down to Instrument's Call Tree View.

Which is this part here in the Detail view.

And the Call Tree View represents the collected call stack samples aggregated by weight, and shown in the Detail view.

Now what exactly does this mean?

Well, this is a lot easier to show in pictures, rather than words.

So let's go ahead and look at a diagram.

So here we have time, moving along to the right.

And here is our very simple application.

We have a main method that's calling into method1.

That's calling into method2.

At this point in time, Time Profiler is going to get a call stack from our application, and it looks like this.

This is just like the call stacks you see when you're in the debugger in Xcode.

But Instruments or Time Profiler indents each subsequent line.

And it does that so it can build a tree structure, which we'll see in a moment.

So time continues on.

Oh, and then it records that call stack.

And it records the number of times it's seeing each method or function within that call stack.

So time continues, and we can see method2 exits, method1 calls method3 which exists, calls method2 again.

Then we go ahead, one millisecond has passed.

And Time Profiler's going to take another sample.

And in this case, it looks identical to the first.

So we'll go ahead and update our table.

And we just increment each of those numbers.

And time's going to continue forward again.

And now we see our application is doing something different.

We have main calling directly into method2; calling into method3.

Time Profiler's going to take another sample right here.

And now we can see our different call stack.

It only has two methods in this case.

And then so when we update our table, this represents a branch.

So now we know that sometimes main calls method1, and sometimes it calls method2.

And we know exactly how many times it's done each of those things.

So time continues on, and we collect a couple more samples.

These happen to be identical to our first samples.

And so we're going to update our table appropriately.

So now let's take a look at what this table is actually telling us.

One thing to note is that Time Profiler's not actually measuring duration.

It's not recording when the method starts, and then when it exits and subtracting those two numbers.

It's aggregating the samples into a useful summary.

So when you see time values in Time Profiler, it's not an actual measurement of duration.

That's the number of samples multiplied by the time between samples.

Which is 1 millisecond, in most cases.

This does have some side effects.

And one of them is that Time Profiler doesn't distinguish between long running methods, or much faster methods that are called repetitively.

And we can actually see that in our call tree here of method1 and method2 show up in the same number of samples.

From Time Profiler's perspective, they incurred the same amount of work.

But we know that method1 was actually called twice, and ran for longer, method2 was called four times and was much shorter.

Time Profiler's also really focused on CPU usage, right?

It's answering the question; what work is my application doing?

And another result of this is it doesn't necessarily capture everything.

In fact, what's missing from our table?

There's no method3, right?

If you have really fast functions, or really fast methods that aren't called very often, they won't appear in your call tree.

And this is actually okay.

This is what we want.

Because they're not having an impact in how much work I have to be doing over time.

So now that we have a better understanding of the call tree, let's see how we can use it to make our app faster and do less.

Okay. So looking back at my timeline, I want to take a look at this big, first block of CPU usage here.

I can see that I'm using 100% CPU for a significant amount of time.

So I want to see exactly what work I'm doing there.

I happen to know, that's when I was doing my duplicate checking.

I know I expect that to take a fair amount of work, so I'm not surprised.

But wherever you're using a lot of CPU, there's usually an opportunity to make it faster.

So in order to get a better idea of what's going on here, I'm going to use the trackpad, and I'm just going to pinch to zoom in; un-pinch.

So I can get a much better idea of exactly what's going on in this time period.

And then I'm going to go ahead and click and drag across the Timeline view.

And this is going to apply a filter.

So now my call tree is only going to show data from samples that were collected within that timeframe.

If you notice, now you can see that the call tree is actually showing much less data.

It's only really focused on this period of time.

So now let's take a look at our call tree in more detail.

You'll see a few columns here.

There's Weight and Self Weight, and then Symbol Name.

The weight represents the percentage of samples that that particular portion of the call tree appeared in.

And you'll see next to the weight there's also a time.

And like I said before, that's not duration.

That's the number of samples multiplied by the time between each sample.

And then there's also Self Weight, which is actually the amount of time that was spent within that method itself, and not some other method that it called.

And you can see actually here, these are 0.

If I go ahead and just close this, it's still 0.

It's still 0.

What this is telling me is that, you know, 99% of my samples were in main.

But actually, all of those were also were also within UIApplication main.

And all of those were within whatever UIApplication main calls.

And instead of disclosing these one-by-one, I'm going to use a new feature in Instruments 8.

Where I hold down the Option key, and click the Disclosure triangle.

And this gives me a smart disclosure.

So it's going to just close my call tree until there's actually interesting data in it.

And if we go ahead and scroll down, we can see now we actually have some methods that have some self weight to them.

In fact, if I look at this particular release call, I can see, you know, its weight is 117 milliseconds, and its self weight is the same.

So it's spent all its time in itself.

Didn't call any subsequent functions or methods.

And actually, you know, looking through these, I see a lot of these Objective-C runtime calls here.

And they're actually all have pretty high self weights.

These runtime methods are super highly optimized and run really quickly.

And they're usually the sort of thing that Time Profiler doesn't capture, because they run so quickly.

But since they're showing up, I know that generally they're being called a lot.

So they're showing up in samples.

This is usually the case if you're looping through a lot of Objective-C objects, you'll see this show up in your trace.

So the other thing the call tree shows us is over here on the right.

We can see which library that portion of the call tree came from.

That particular method call.

And here we can see this is coming with the photo's framework.

And we actually work our way up.

Scroll to the right here.

We can see, here's the line of code from my app that's causing all this work to happen.

You know, this particular line has 0 self weight.

It's not doing any of the work itself.

But it's the thing that's triggering all the work.

You know, I'm not going to be able to make Objective-C release call any faster, but I can see if there's something here I can do to maybe call it less.

So I'm going to double click on this line.

And it's going to take me to my code.

And Instruments annotates each line with the number of samples that appeared in.

So looking at this method I can see that most of my time is spent on that red line there.

But if I take a closer look, I actually see something else here.

I have nested four loops.

And in fact, each of those loops is iterating over my entire photos collection.

And this is pretty much a classic example of what's known as N squared behavior.

As I throw more data at this code, it's going to get slower, and slower, and slower.

And since I know it's going to operate over a lot of data, this is bad.

So I want to go ahead and change this.

So I'm going to go into Xcode.

And here's my code here.

A typical way of avoiding nested four loops is use a set-based implementation.

I'm not going to go into details, but I've gone ahead and created a new algorithm for doing my duplicate checking that uses That uses a counted set and another set to find duplicates.

Now let's go ahead and verify that I actually made a difference.

So again I'm going to go to the Product menu.

And I'm going to choose Profile.

It's going to rebuild my application and install it to the device.

And then since Instruments is already running, it's going to go ahead and start the recording right away.

So as it's copying it to the device let's see, starting the profiling.

Here's Instruments.

And my application has started.

Okay, and it's starting recording, I'm going to zoom out.

Okay, I'll zoom out so we can see some data.

And I'm going to go ahead and do my duplicate checking.

Duplicates.

All Photos.

And I can see the data come in.

Oh, I'm actually at minutes.

I thought this was seconds.

Okay. So my duplicate checking actually happened much faster.

In fact, we no longer see that large bar going across the timeline anymore.

So that's a good change.

I want to go ahead and re-measure the rest of my apps.

I'm going to select some photos.

I'm going to add some eyes.

I'm going to shake it.

And then I'm going to dismiss that photo.

So now my application is static.

It shouldn't be doing any work.

But let's go ahead and take a look at what the Time Profiler is telling us.

Let's stop the recording.

It's taking a minute to catch up with the data.

Whoa. Okay, so I can see here at the end of this at the end of the call, there's some CPU usage here.

It's not much, but I'm expecting it to be 0.

So I want go ahead and see what that's doing.

I'm going to zoom in again with the trackpad.

And then I'm going to drag to select to see what's going on here.

And then I'm going to go ahead and option click through here to see what's going on.

And now if you look at my call tree, this time I can see there are a lot of calls in CoreMotion, right?

And I know what my app is doing here, it's using CoreMotion to detect and make the eyes move.

But moving back up the call tree, there's actually none of my code in this call tree.

So there's nothing I'm doing that's directly invoking this.

But something in my app is making CoreMotion do work when it shouldn't be.

And what I know about my app, is I realize that I'm not turning off the CoreMotion monitoring when there are no eyes on the screen.

So let's go ahead and make that change.

We go back to Xcode.

And then I actually have the methods all written.

It's just a matter of calling them.

So let me go ahead and add them here.

And then we stop observing here.

Okay. Now let's measure this and make sure it changed as we expect it to.

And again, Instruments is going to rebuild the application; put it on the device.

And also, since Instruments is already up and running, it'll start recording again.

This time I'm going to zoom out ahead of time.

And I also click outside of the filter, and that removes it.

Okay. And once the application is on the device, Instruments will start recording it once more.

Okay, there we go.

Okay, so this time I'm going to go ahead and just go ahead and add my photos.

And then I'm going to scroll to a photo.

Select it.

Add the eyes.

Do the shaking let's actually zoom this in a little more.

I'm going to choose a new photo so the eyes move off the screen.

And then let's go ahead and stop the recording, and see if it changes as we expected.

And now if I zoom in on the end here once Instruments is done processing the data.

I can see that my CPU usage is now 0 here at the end.

So I'm no longer doing that work when I didn't need to be.

So let's take a look and review what we've done.

So the first thing we did, is we used the trackpad to figure out how much work our app was doing and when it was doing it.

And it was a good way to find areas of interest to focus on.

And the trackpad or the timeline is really trackpad friendly.

But it also works great with a mouse.

You can option-drag, and control-drag to zoom in and out.

So we found an area on the track view that was using a lot of CPU.

And we wanted to make it go faster.

So we went ahead, we zoomed in, and we filtered on that area.

We examined the call tree and we found out where the work was happening in that portion of our app.

And then we walked back up the call tree until we actually encountered our code.

What part of our app is causing this work to happen, even though it's not doing the work itself?

We took a look at our code.

We thought about it.

Scratched our heads.

We ate a sandwich, and we figured out a way to make it faster.

So we went ahead and implemented those changes, and verified that they actually did make it faster.

It was a significant improvement in this case.

And as a result, we saved a bunch of the user's time.

Now when they're using my app to find duplicates, they won't be looking at that spinner forever.

It'll finish much more quickly.

Now in terms of doing less, what did we do?

Again, we focused on an area of the track view, where this time it wasn't using a lot of CPU, but it was using an unexpected amount of CPU.

We zoomed in.

We applied a filter.

We took a look at the call tree, and this time we looked at the frameworks involved again, we saw it was CoreMotion, but none of our code was being evoked.

So this time we had to think about our app in a different way.

What portion of our app is making this work happen?

We went ahead and again, maybe took a little break.

Ate another sandwich.

And figured out how to make our app stop doing that work so it's no longer happening.

We went ahead and reprofiled the app.

Verified that our changes did what we expected them to do.

And actually, as a result of this, we significantly improved the battery life while our app is running.

Any time the CPU can go to 0 and go to sleep, that represents significant savings.

So even though our CPU usage was low, it still has a battery cost.

So now that we've gone faster, and done less, let's talk about making our apps more responsive.

So the way your application works, is the main thread that does all the user interface work.

It's responsible for responding to user input, and then updating your views.

And the way it handles the user input, is that it's got a main thread.

And the main thread has a run loop that's just watching a queue called an Event Queue and waiting for events to appear on it.

When an event appears, it sends it to your UIApplication, or NS application instance.

Which then passes the event down through the responder chain in your application.

And if you application wants to respond to that event, then it goes ahead and invokes your code.

And you can actually see this in a call stack here.

Here's the call stack from my application.

And we can see there, there's the main thread.

It calls the main function which is, you know, where all our applications start.

Now in here, I don't know exactly what these functions do, but they're clearly related to a run loop.

And I can see here, here's a handleEventQueue function.

So it's clearly got an event off the event queue, and it's handling it.

I can see it gets passed into my UIApplication, and then passed through the responder chain until it gets to my code.

And in this case, this is an IB action on a button.

And then my code does its work.

So what does this mean in terms of keeping your apps responsive?

Well, when busy, the main thread can't process the queue.

So it starts to back up your user events back up.

And then as a result, you get stuttering and hiccups.

And in the worst case, your app becomes completely unresponsive.

The user closes it, and maybe deletes it from the device.

So it's really important to keep your main thread free, so it's able to respond to the user input in a very quick manner.

So now with that in our pocket, let's see what we can do with our app in order to make it more responsive.

All right, I'm going to zoom out of my track on the track view this time.

And now the area I want to focus on is this big CPU spike right here.

And here I'm seeing well over 100% CPU.

So there's probably an opportunity for optimization here.

And I happen to know, this is also when I'm adding my eyes to the image.

So again, I'm going to just zoom in using the trackpad.

I'm going to go ahead and click and drag to apply a filter.

Now if we look at the call tree, this time I want to specifically focus on what the main thread is doing.

I can see here that actually, my main thread is showing up in about 63%, 64% of my samples.

And so I know that during this time, it's clearly doing some work.

But I want to know exactly how much work it's doing.

And to do that, I'm going to go ahead and up here in the upper right-hand corner, we have what we call the different strategies and Instruments.

So you can view your data, either per CPU or per thread.

This time I want to look at the data per thread.

And I can actually see, if I make this a little bit bigger, that over here on the left, you can see all the threads in my app.

And the main thread is at the top.

And so I can actually see the CPU usage on just the main thread.

And here, clearly, it's getting up to 100%.

And when it's in that state, it absolutely can't respond to user events.

There's no way it's going to be able to handle things on the queue if it's busy doing the work in my app.

So what work is it doing on the main thread exactly?

This time, I'm going go ahead and click on our extended detail view over here on the right.

And this is going to show me my call stack that's doing the most work on my main thread.

Because my main thread is what's selected over on the left.

So I scroll through here, I can see again here's the run loop stuff.

And then here we actually get to my code.

And this is the code where it's adding the eyes.

And then doing some more add eyes work.

And then here it's calling Find Faces.

So that's the face detection code in my application.

And then we can actually see the next line is actually calling into Core Image.

If I go ahead and click on this in the Detail view, it'll expand the call tree to that line.

Now again, reasoning about my code.

There's no reason for me to do face detection on the main thread.

It's not a responding to user input thing.

And it's not an updating views thing.

So this can easily be moved to a background thread.

So here's my code that does the face detection.

And I can see once the face detection is done, it goes ahead and calls the displayEyes method.

Now an important thing to note here is that displayEyes is actually updating my UI.

So that's work that has to happen on the main thread.

So what I need to do is do my face detection on a background thread.

Switch back to the main thread.

In order to update my UI.

Oh, there we go.

Okay [applause] thank you [applause].

That's not supposed to be the applause line [laughter], but I'll take it.

So here we go.

Here's my code that does this on the background.

I'm going to very quickly talk about this.

So we go ahead and we get one of our background queues.

And we dispatch our find faces code to it.

Then just like I said, we're going to dispatch back to the main queue, and update our UI.

This time to do my profiling, I'm going to hold on the Run menu, and choose Profile.

This will do the same thing.

Hopefully rather quickly.

Oh, I deleted my code.

Let's do that again.

So now since I've already done it, I can just go ahead and is it going to be happy click the Profile button.

There we go.

It's going to build my application.

Install it to the device.

Okay. Now Instruments should show up and start recording my application.

Oh, so I'm going to go ahead and switch back to the Instruments view.

And then let's zoom out so we can see the data coming in.

Okay. Very quickly, let's add some photos to my app.

And then let's do the eye detection.

So there we go.

So there I can see my big CPU spike.

Let's go ahead and stop the recording.

And now let's see if this made the changes I expected.

So again, I'm going to zoom in on this area of the graph using a trackpad.

I'm going to click and drag across the Timeline view to apply a filter.

Now, if we go ahead and look at our call tree this time, actually what I see here now, the main thread is only in 4.8% of my samples.

So that's a really good sign.

I already know my main thread is doing less work.

And those two worker threads are doing much more of the work.

So let's go ahead and switch to the thread strategy.

And now I can see here, my main thread is actually completely free for the vast majority of this time, thank you.

[ Applause ]

So let's go ahead and go back to the slides, and talk about what we've just done.

So in order to make my application more responsive, I've found an area where the CPU was being used a lot.

And I went ahead and zoomed in and applied a filter it's a common theme here.

I focused on the main thread, because I wanted to make sure my app continued to be responsive.

I identified some work that we were doing on the main thread that wasn't UI work.

So I went ahead and moved that to a background thread.

Once again you can probably guess the next step.

I verified my changes.

Made sure the app was behaving as I now expected it to.

And as a result, I got a much better user experience.

My user interface will remain responsive while I'm doing my face detection.

Which is really great.

A couple notes regarding optimization.

The fixes we made today were relatively simple, but they did add complexity to our app.

And in general, the practice of optimization is trading simplicity for performance.

But fortunately, because I did my profiling early in my development cycle I didn't wait to the very end.

I now have time to verify that my app continues to behave correctly.

I can add additional unit tests if I think that's necessary or UI tests.

And focus on where I added complexity, and make sure it still works.

Just before you're about to ship the app is the very last time you want to be adding complexity to it.

So the real takeaway here, is that it pays to profile your app continuously.

Especially after you add new functionality.

And profile early.

And then profile it ongoing on an ongoing basis.

And sometimes, just running time profiling and seeing what you can find, you can find a really big win quickly.

So it pays just to run it and see if there's anything interesting going on.

Some best practices about using Time Profiler.

The general overview here is that you want to be profiling as close as you can to what your customers will be running.

So you always want to profile release builds.

Not only do you want the optimizations from the compiler, but release builds are what your customers are going to be using.

You always want to profile on the device.

The simulator is not going to give you useful data to work with.

Profile on the oldest, slowest devices you intend to support.

A pretty good rule is, if you can make it run well on slower devices; you get it running well on faster devices for free.

And you want all of your users to have a great experience using your app.

Even the ones on slower devices.

And use large data sets where it makes sense.

In my case, you know, sometimes users have tens of thousands or hundreds of thousands or a hundred thousand photos in their photos collection.

It's important that I profile how my app behaves under those circumstances.

So I can make sure it runs well as my customers will be running it.

So, to wrap everything up, if you want your customers to love using your app because it helps save them time.

It helps them achieve all-day battery life.

And your UI is really responsive, then what you need to do is profile early and profile often.

If you want more information, you can go to this URL.

Here are some related oh, here are some related sessions.

Last year we did a session called Profiling in Depth.

Which goes into more detail about the Time Profiler, if you want to know more.

And then this year we did a sequel to that, System Trace in Depth.

Which is how you can use the System Trace instrument in situations where Time Profiler's not helping you.

Thank you, and have a good afternoon.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US