Advanced Performance Analysis with Instruments

Session 309 WWDC 2010

Great Mac and iPhone applications delight users with fast execution and responsive UI. Join the Instruments team as you dive deeply into the latest techniques to improve your application's performance.

Lynne: Good morning, everyone.

My name is Lynne Salameh, and I'd like to welcome you to Advanced Performance Analysis with Instruments.

Before we start, let's take a quick look at today's agenda.

As you can see, we've got a lot of ground to cover, because in today's session we're going to be showing you how to use efficient and advanced profiling tools such as Time Profiling and Data Mining, in order to find out where your code is spending most of its time.

We're also going to be introducing new Instrumentation that will allow you to understand the power characteristics of your iPhone applications and understand how your applications interact with the system as a whole.

So let's start off, let's talk about Time Profiler.

In Time Profiling is a statistical sampling mechanism that was first pioneered by Shark, and is today in Instruments the most efficient sampling mechanism for use on the iPhone and for your Mac applications in order to find out where your applications are spending most of their time.

How does Time Profiler work?

Time Profiler looks at the running threads in your code, and it basically takes samples at regular time intervals.

And shown here are those backtraces that are collected when those samples occur.

So these are example backtraces and we'll be taking a look at them in more detail later.

But one thing to note here is Time Profiler also collects backtraces from your kernel and drivers in order to show you the time spent in that code as well.

So Instruments takes these backtraces and constructs what is called the call tree in order to simplify and make it easier to understand what's going on in your code and make it easier to find where you're spending most of your time.

So I'm going to show you how Instruments construct an inverted call tree, which means the deepest stacked frame is going to be at the root of the tree and we're going to work backwards and view our call history in reverse.

So to start off, let's assume that we're spending 1 millisecond in each of these backtraces.

So we see that we're spending 2 milliseconds in the frame interpolate, and that's our deepest frame.

And then we can work backwards, acquire image we're spending 4 milliseconds there.

But 2 of those milliseconds go ahead and call interpolate, so we have 2 different nodes on our tree, and so forth with cg drawImage and then copyImage, which is actually a frame occurring in our codes, the mipmaps application.

And then finally, main, which is our entry sequence interrupt code.

So this is how Instruments combines backtraces to construct an inverted call tree.

But the way that you view these in Instruments, or how Instruments actually displays these, is using an out find view.

So if we take the center call tree over here, it's actually shown, slide let's move back 1 second.

Yeah, there we go.

It's actually shown in an outline format shown to the right in Instruments.

All right.

So really you should use Time Profiler for your sampling needs because of 3 reasons.

First of all, the profiling is performed from within the kernel.

Which means, Time Profiler has very low impact on the applications you're trying to look at.

And it really collects stricter, more precise timing information about the applications you're trying to target.

Secondly, you can use Time Profiler to look at not only the threads that are running in your applications, but also to target all thread states.

And what this means is that you can also find where your threads are waiting in your, in the code that you've written in addition to when they're running.

And finally, you can use Time Profiler to target all the processes on your system and also to collect kernel fact traces when you're looking at Mac applications.

So we talked about Time Profiler in relation to overhead when recording.

And let's actually talk about overhead when we're dealing with the recording styles that Instruments uses.

So the traditional recording mode of Instruments is called the immediate modes.

And this is very useful because it allows you to correlate events that you see, that you perform, sorry, in your applications.

Like for example, clicking a button with events that you see on a track.

Like a spike on the track a change in the tables while you're recording.

Unfortunately, because Instruments is actually using the same CPU that your app is actually, you know, executing on, and it's using the core, it's really consuming some of the time that could be used by your application.

So this is what we call the observer effect.

Where the timing that we collect is not as precise as it could be, the timing information.

So new in Instruments as part of the iPhone 4 SDK, we've introduced the deferred recording mode.

In the deferred mode, Instruments disables your UI, disables the track and the detail views while you're recording, and it defers all of the data processing towards the end of, until you hit basically stop recording.

And what this does is it minimizes the observer effect, which is what we want.

So how do you access the deferred mode?

You access it from the file menu in Instruments, and you go to Record Options and select Deferred Display.

After you do that, if you hit Start Recording, Instruments will immediately disable the UI and gray it out like you see here, and you start recording in deferred mode.

And the UI will come up once you hit Stop Recording.

All right.

So also new in Instruments as part of the iPhone 4 SDK, we've introduced, we've allowed Time Profiler to collect kernel backtraces for applications running on the Mac.

So this allows you to peek under the covers, so to speak, of what your applications are doing.

For example, if you see that your code is making a lot of system calls and you're spending a lot of time making system calls, only when you uncover the kernel backtraces can you see why these system calls are actually costly and what they're doing.

And this is very useful for optimization of drivers and kernel extensions that you might write.

So, for example, let's take a look at this stack trace we have here on the right.

We can see that, you know, the lowest frame we have is for performing a mem copy.

And only when we uncover the kernels stack traces can we see that some of these mem copies do incur virtual memory faults, where the memory manager pages in a page in order for mem copy to use.

We've also unified the backtraces in the call tree view, so that the tops version, or the tops frames in this stack trace or kernel frames, and the littler ones are user frames.

So to make it easier for you to view your code in a unified fashion.

Additional backtrace improvements, or backtrace viewing improvements, is the introduction of stack compression.

And as you can see here, I can just move the slider to actually simply the stack traces that I see in the external extended detail view, in order to collapse them along framework boundaries and collapse them down to just code executing in My Applications.

So we talked about improvements to stack viewing.

We've also improved the way you view your source code by introducing the disassembly view.

The disassembly view shows you the instructions that your code is using for the ARM and Intel architectures.

And in this case, using the disassembly view can be useful, for example, when you have the bunch of nested function calls, and you see that your code is spending a lot of time on that line with a nested function call.

So only when you navigate to the disassembly view you can figure out precisely which function is actually consuming all the time.

The disassembly view can also show you the effectiveness of your templates expansions and your inline code.

And finally, when every instruction counts, and you're trying to optimize tight loops, the disassembly view lets you do optimization on the smallest level of granularity.

You can access the disassembly view via 2 methods.

So this is the source view that we know and love in Instruments.

And you can see that it's annotated with some profiling information from Time Profiler.

If you actually select the Information button over here, the Instruments will pull up, the instructions corresponding to the code that you see in the source view.

You can also access the disassembly view by selecting the top right button in the corner.

And it will toggle between the source views and disassembly views.

And as you can see, my disassembly actually has my source annotated as well with the profiling information from Time Profiler.

Now new in Instruments as part of Xcode 4 is the availability of the documentation for the in phone ARM disassembler instruction sets, which you can access by selecting this gear and selecting show ISA Reference Guide, which will then take you to the documentation for these instructions.

All right, so we've talked about Time Profiler.

Let's show you Time Profiler in action.

I'm going to switch over to the demo machine.

And here I have Instruments.

And the first thing you see is a template chooser over here.

So if I zoom in I can see that over here on the left-hand side I have several categories.

And I've selected the Mac categories.

And I'm going to select Time Profiler Template to add the Time Profiler Instruments to my trace documents.

I'm also going to target an application that you might have seen during the State of the Union demos called Mipmap.

And I'm going to hit record to bring up that application.

So Mipmap is basically takes a bunch of images.

It renders thumbnails to those images onto the screen.

And then it continues to create scaled down versions of those images, which are called mipmaps.

And what I'm going to do is I'm going to start recording.

And you can see Instruments actually collecting information in the background.

And I'm going to try to scroll while I'm trying to render these thumbnails.

And as you can see my application is actually hung, it starts spinning.

The animations are choppy.

It's not really doing what I want it to do.

And I really want to figure out what's going on in this case.

So it's finished processing the images, and I can just switch back to Instruments and hit Stop Recording and start looking at the data I've collected.

So the first thing I'm going to draw attention to is the configuration view over here.

This basically controls how the settings for your backtraces and your call trees.

And in this case I've selected Invert Call Tree, so that I can see my call tree in inverted fashion.

I'm also going to select Separate by Threads.

And over here to the right you see that I'm spending a lot of time, about 16 seconds in this frame right here, Resample Byte H for CPP vector, which was happening in Core Graphics.

This is not really my code.

And neither is this where I'm spending about 8 seconds, decode MCU optimize.

So let's take a look at, let's investigate this further.

I'm going to zoom out and bring in the extended detail view from the top right corner over here.

It pulls out this drawer.

And the first thing I see when I look at the stack trace, if I zoom in again, is this code the resample by 84CPP Vector is actually getting called from my code in ScaleImage.

I'm kind of, I'm kind of expecting that I'm spending a lot of time scaling my images, in this case, on the separate threads that I'm using to do this process.

The next frame down, what I see here is that a while bunch of code is going on.

Let me zoom in again.

Sorry. There we go.

There's a whole bunch of code that's doing image decompression.

And this is happening in frameworks that are not my code.

And if we look at this part right here it's trying to acquire image to draw that image.

If we scroll back up we're seeing that what we're doing here is attempting to do a lot of work to create and decompress images when we're in the thumbnail view draw rec.

We're trying to render the thumbnails.

And if we actually scroll up all the way down and zoom in on this backtrace, we can see that we're doing a lot of this work, this decompression of images on the main threads.

In fact, I'm going to go ahead and select Uninvert the Call Tree to see my, the call stacks in a top down manner.

And you can see that in this case; oh, let me zoom in again, I'm sorry.

I'm spending; you know, I have all these threads that are supposed to be doing the image creation, the thumbnail creation and initialization, but I'm actually spending all of my time on the main thread.

And there is a disproportionate amount of time spent there in comparison with the other threads.

And this thumbnail view draw rect is appearing as the heaviest stack trace on my main thread.

So let me take a look at my code and see what I'm doing there.

I'm going to bring up Xcode.

And this method right here, the; let me actually scroll up to it.

The Start P Threads Methods, what it's doing is it's doing all the work.

It's creating the thumbnails and going ahead and creating the scale down version of the mipmaps.

And what I had assumed when I first wrote this code, that since this code, this part of the code is executing on a secondary thread, my call to NSImage alloc and it was context of URL, to create the thumbnails, would actually do all the work of the initialization for the images.

But in fact, Time Profiler has told me that all this work was getting done on the main threads, so in fact, this call is actually lazy.

It doesn't do any work till it actually needs to render the images.

So to optimize this, I'm actually going to replace this call with code I know, takes the initialization stage and actually does it on the thread that I intended it do to it on.

So I'm going to delete these two lines of code.

And I'm going to bring in some new ones.

All right.

So instead of using the original call to create the thumbnails, I scale my image down using Scale Image ImageClipFracForResizeMode.

And then create the thumbnails over here with inits with cg image, and then assign them to the mipmaps.thumbnail on the main threads.

So hopefully this should do what I expect it to do.

I'm going to save, hit build, and I'm going to verify that this change actually improved my performance by going back to Time Profiler and running Time Profiler again.

So here's Time Profiler.

So since I've already selected mipmap I'm just going to hit Record again.

And it'll bring back my mipmap application.

And now I'm going to hit Start.

I'm going to scroll.

It seems smooth.

It's actually running through without any choppiness, and it's done in 5 seconds as opposed to 11 when we first started off.

So to verify this actually did change what we wanted it to change, I'm going to go back to Instruments, hit Stop Recording.

And if we zoom in actually you can see that when we were originally spending about 11 seconds on the main thread trying to initialize these images, we're now spending about 3 seconds.

And the time is now more evenly distributed amongst our threads that are rendering the, creating the thumb nails and creating the mipmaps.

All right, so this is how you use Time Profiler to find where your code is spending most of its time, and basically find, try to optimize this.

And now I'm going to switch back to the slides and talk about Data Mining.

So one thing we saw during our demo was that we were looking at a lot of frames and stacks that were not coming from our codes.

There were, you know, operations that were getting performed in Core Graphics, all these image-rendering methods that we have no control over.

So I'm going to tell you about a very useful tool, it allows you to charge the time that you spent in those libraries and in those frames to your code, because they're doing code on your behalf.

And this is called Data Mining.

So Data Mining involves a bunch of operations.

And the first of these is charging a symbol to its caller.

For example in this case, we have an example, sorry, call stack.

Sorry. We have an example call tree.

And we have some example times shown on that call tree.

If I charge acquire image to its caller, I take the time spent there, the 98 milliseconds, and attribute it to its callers.

And I can then combine the far end of my backtrace together to see that I spent 1 of 5 milliseconds in draw image in that part of the call tree.

In fact, you might find it useful to charge an entire library to its callers.

And in this case what you can do is if I charge Core Graphics to its callers, I take all the time spent there and attribute it to its calling frames.

And once again, I can combine the furthest branches of the tree to see that I'm spending 139 milliseconds in acquire image.

A lot of times you might want to take away the time spent in a certain region of your code from the rest of your call tree.

For example, if you're not interested in that region, because you can't really optimize it further.

So this is pruning a symbol in its subtree.

And what you do, if for example, we prune away powerOf2Image is we take the time spent in powerOf2Image in all the frames that it called, and remove it from our tree.

Which means, that instead of 98 milliseconds spent in interpolate, we're now spending 76 milliseconds there.

And finally, you might find some cases where your code is making a lot of calls into frames that are happening in a different library.

And there's a lot of these frames.

Then it calls back into your code.

And really you only care about the boundaries, where you know, the transition happens from your code, into that library, back to your code.

In this case, you should flatten the library to its boundaries.

So if we do that in this case with Core Graphics, we just remove away the extra frames and keep the ones that occur at the transition points.

So that concludes; or that basically tells you how to use Data Mining.

And I recommend you use it to simplify your backtraces and find out precisely where in your code you're spending most of your time.

And with that I'd like to introduce my colleague Chad Woolf.

Chad: All right.

Good morning everyone.

My name is Chad Woolf as Lynne mentioned.

I work with Lynne on the Instruments team.

And this morning I want to talk to you guys about the new iOS 4 and how it uses power.

Now it's hard to talk about mobile devices these days and not talk about the topic of power.

And this is because battery life is such an important part of the user experience these days.

And the way that we get the most battery life on our devices is to use as little power as we can.

Now in order for you guys to help us do that, we need to give you both a way to measure power, and determine where that powers being spent in the device.

So I'm happy to announce in the iPhone SDK 4 we have a new suite of power analysis tools.

We have within Instruments an Energy Usage Instrument, which actually measures energy usage directly from the hardware in your mobile device.

This come right to, as close as we can get to the battery and it measure the energy flow from the battery into the device.

Now our next Instrument is a CPU Activity Instrument.

In which case we're measuring the CPU activity in a given interval.

We have a new Sleep-Wake Instrument, which records our sleep and wake events.

And we have a Display Brightness Instrument, which will record when the display is come on and when it's gone off, and we record the user setting for the display brightness there as well.

We also have a few radio Instruments, which monitor the states of the radios in the iPhone and the iPod.

And we've taken all those Instruments and we've put them into a single energy diagnostics template, which will create a single document for you with all these Instruments populated.

Now one of the things you'll notice when you start doing power measurement on a mobile device is that mobile devices tend to function differently when they're at home versus when they're in the field.

Now in this context what I mean by home is really connected via the USB cable to your host machine in your development environment.

Now when you connect that device up to your host, a couple of things happen differently in the device.

First, the sleep and wake is disabled, so you're not going to see sleep and wake events occurring.

Energy's actually flowing from the host into the battery to charge it.

So any of the measurements that were taken at the battery to determine the energy are going to be fundamentally affected by that.

And finally there's a little bit of data that flows between the device and the host when they're connected, and that can generate some activity in the device that you would normally not see if you were disconnected and in the field.

Now when you do take your device into the field and you test it in the domain that it's going to be tested in, if it's a social networking application maybe a cafe, or if it's a turn-by-turn direction system out, you know, maybe in your car, you're going to notice that you're entering into a more diverse radio environment.

And what I mean by that is that as you move with your device from cell-to-cell on a cellular network, you're going to get different levels of power performance for your radios.

As you move within range of different Wi-Fi hotspots, some of those will be able to make connections to.

Some of those you will not be able to make connections to.

And that will affect how your device is using power as well.

And then furthermore, if you are developing something that uses Core Location, the best way to test your Core Location app is to actually take the device and move it from place to place, so you can see those events coming in and how they interact in your application.

So in the iPhone 4 SDK we have a new way of recording while you're in the field, and that's actually being able to turn on power logging on the device.

So when you're, when you have your device and you want to do the testing of your application, you can go to the settings application.

We have a new developer settings subsection there, and a switch for power logging.

What you'd do is you'd turn that logging switch on and that enables the power logging service.

And the power logging service can record data all day if you like.

It's a very efficient mechanism.

And then at the end of the day you can turn off your power logging, go back to your host machine, connect it up to Instruments, and pull of that data down and analyze it in the Instruments application.

Now of course we always support the standard way of doing recording in Instruments, and that's to have the device connected to your host.

And push in the Record button in Instruments and then you can see the data as it comes in and as you interact with the device.

Now let's talk about that first Instrument.

The first Instrument's the Energy Usage Instrument.

And as I mentioned that is a true power measurement.

We're actually going right to the hardware as close as we can to the battery to get these measurements.

Now the Instrument is taking the measurements it's receiving, and it's converting it to an energy factor between 1 and 20.

Low numbers means lower energy usage, which means you're going to get longer battery life.

And higher numbers means that you're at a higher energy usage and you're going to have shorter battery life.

So how short or how long?

Well if you're able to sustain an energy level of 1, it would take over 20 hours to fully discharge a fully charged battery.

And if you're able to sustain an energy usage level of 20, then you'll be able to discharge a fully charged battery in a little less than an hour.

If you're somewhere in the middle, let's say an energy usage of about 10, that's about 10 hours of projected battery life.

Now because this is being measured from the hardware, the sensors are only currently in the iPhone 3GS, the iPhone 4, iPod touch second and third generations.

And if you have a device that's not on this list, the energy usage Instrument will show zero, but the rest of the Instrumentation will be working just the way it normally would.

So let's talk about the second Instrument in there.

That's the CPU activity Instrument.

Now the CPU activity Instrument is like the activity monitor, but it's a much more compact, lightweight version.

CPU activity actually ends up being a significant portion of your power profile and where you spend a lot of your energy.

And that's because power usage from the CPU subsystem scales at how much work you're actually giving the CPU.

So the more work you give it, the more activity that you show in the tools, the more power you're going to be, the more power it takes to drive that.

So within the Instrument we measure the total activity, and that's the amount of the CPU is active in a given window of time.

But we also break it down into the current foreground applications activity.

We record that.

And we also record the graphics and render server, and the code that's executing in there.

And the audio/video server, otherwise known as media server D as well, and that's listed in the audio processing column here in the Instrument.

Now, the reason that we record these and break these down in this fashion is because the workload of your application tends to move around.

If you're using Core Animation, for example, your foreground activity, your foreground application activity may show a very small number, 1 or 2%.

But you may see the graphics column showing 20 or 30%.

And this occurs when Core Animation is sent a lot of the compositing work over to the graphics and render server.

Now there's also another service for playing movies and audio, and that's access via the AV Foundation, the media player framework, and OpenAL.

And if your applications are doing work with that type of media, you'll see the audio processing column show up with some activity as well.

So in addition to the CPU activity, this Instrument also tracks application state changes.

And it puts flags on the Instruments track view, so you can see when an application has moved to the foreground.

When it's moved to the background, and when it's in the background, if it's become suspended.

And also there's a flag for foreground obscured.

And that's what happens when your foreground application is running and then the lock screen comes down and prevents the user from further interacting with it.

Now moving on, we have 3 radio Instruments.

I have 2 on this slide, which are similar and they're related, so I'll talk about those first.

The Bluetooth and Wi-Fi Instruments are actually tracking the user preference for these radios.

And it's pretty important to understand when you're recording a trace and comparing two traces what the current state of your radio's worth.

So if the user had turned on Bluetooth, your device would be acting differently than it would be if that were disabled.

For example, with Bluetooth enabled, you could be communicating with external devices.

With Wi-Fi enabled, a couple of interesting things happen.

Core Location is able to use the Wi-Fi hotspots that are around you for a more efficient form of location information, so it doesn't have to turn the GPS on.

And if you're in a Wi-Fi hotspot that you're able to connect to, you could be sending your data over that Wi-Fi hotspot in a more efficient method than using the cellular technology.

Now if it's disabled, what you're doing is your forcing all of your data from your application to travel over that cellular network.

So if you're optimizing applications for data transfers over the cellular network, then you should turn off Wi-Fi and you'll see that appear in the Instrument.

Now the third radio is the GPS.

I want to talk about this one a little bit differently, because your application and how you use Core Location can actually drive whether the GPS is on or off.

Now when it's enabled, it is one of the higher power radios on the iPhone, but it is needed for a high resolution and high accuracy when you're using Core Location.

So when you're using Core Location and you have an instance of your CLLocationManager, if you set the desiredAccuracy to a lower value, so that your desire's not to see a meter accuracy, but maybe a kilometer accuracy, then Core Location can bypass turning on the GPS and use potentially the cellular technology or the Wi-Fi.

The same goes for the distanceFilter.

The distanceFilter tells Core Location how often to update your application as you move from place to place.

If you set that appropriately for your application you might not need to turn on GPS as well.

And the last and the most important API that you guys will be interested in on the Core Location or the CLLocationManager object is stopUpdatingLocation and startUpdatingLocation.

As your user moves from the screen that shows the GPS information, the location information that you're trying to look at, to a screen that doesn't.

Let's say a web page, or a text view, or something like that.

You should stopUpdatingLocation.

At that point you will actually disable the GPS if it's turned on.

And another Instrument here we're recording brightness.

Now we're recording the user-selected brightness level.

And when that Instrument shows zero that means the display is off.

The backlight does consume quite a bit of energy.

So when you see that Instrument showing a positive value you'll know that a certain portion of your energy is going to be due to the brightness.

And as you go from trace to trace you want to make sure that the users are using the same brightness, roughly, so that you can have comparable power traces.

Now what's important about this Instrument is that when that screen goes off, you know that the users are no longer interacting with your device.

And this can happen for two reasons.

One it hit the Sleep/Wake button on the top of the phone to turn the display off and put in their pocket.

Or they're not physically interacting with the device and it's gone to idle dim and then turned the screen off.

Now the last Instrument I want to talk about here is the Sleep/Wake Instrument.

And this is one of our most important Instruments, especially today in the world of the multitasking API's and background applications.

In the Sleep and Wake Instrument it's going to help you get the most out of your battery life.

And the reason is, because sleep is the most efficient mode of the phone, and where it spends most of its time.

So in a healthy situation here the sleep should occur about 15 seconds after the screen turns off.

The phone will finish all of its work and then drop down into a sleep mode.

Now if you have a background application that continues to run, that'll obviously push that sleep out.

Now with local notifications and push notifications and location updates, all of these technologies will cause the device to wake up.

So if you're using multitasking APIs and you're using background application type notifications, you're going to want to watch that Sleep/Wake Instrument very carefully to see if those notifications are waking up your machine or your device more than they need to.

All right, so I want to show you all of this in a single demo here.

So what I want to show you is turning on the logging on an actual device.

I want to show you how to import a trace into Instruments.

And then I want to show you, with a trace I've already recorded, a couple of things.

I sent myself in this trace, I sent myself an email, so that generated a push notification.

Then I watched a little bit of video on YouTube, and then I used a few applications that use the GPS.

So I want to show you how to find all of these events in an Instruments trace.

Let me start with a device here.

Now you'll notice here that the device is disconnected.

I don't have this connected.

This is how you're going to want to turn this on in the field.

So let's turn on our phone.

Now here we have our settings application, that's where you're going to find the developer settings.

And if you scroll down here towards the bottom, you're going to see the developer setting.

Now if you get to the point where you don't see the developer setting, there are two possibilities for that.

The first is that the device hasn't been set up for development purposes, and that's done through Xcode.

You usually do that once.

But the other possibility is that you've rebooted the phone and you haven't yet connected it to Xcode or Instrument.

And if that's the case then you won't see the developer panel until you first connect it up again from the reboot into Instruments or Xcode.

Now this is due to a failsafe that we have in developer tools.

If anything were to go wrong while you're in the development mode, simply resetting your device will take all the development tools off of your device, as well as the data.

So you have to be careful when you're recording information here that you don't run the battery completely dry, because when you reboot it unfortunately all the data will be not there anymore.

So the normal mode here would be to go to the developer panel, flip logging on.

So there you go, on.

And then we can turn it off.

Now when you turn it on and off, you're going to be creating a new session.

And so if you've turned it, once you turn it on again, you're going to end up removing that last session.

So be aware of that.

And be careful when you're flipping that switch back and forth because you may lose your data.

So now what I'm going to do is instead of sitting here for 20 minutes recording power while I talk to you guys, I have another device, which I'm going to connect up, which has the power trace already on it here.

So the visual demonstration of me connecting the device.

[ applause ]

Chad: All right.

So let's move over to our demo machine.

[ silence ]

Chad: Okay.

So I'm going to start a new Instruments document here.

In the iPhone section of the template chooser.

Try and zoom in on here for you.

If you scroll down we have a new, here's the energy diagnostics template.

Start with a new document here.

We have all of our Instruments populated.

Now what you want to do is, is the way you choose which device you want to import the document from, you, is a standard for the targets chooser.

That's our iPhone.

Now to import the data under the file menu we have an Import Energy Diagnostics From Device.

We'll do that and the data will come.

Now during this demonstration as I navigate the trace, I'm going to be using this track scale control down here.

I don't know if you guys have had a chance to use that.

But this allows you to zoom out, so you can see all of your data on the same track, or on the same screen, and zoom in for more detail.

So I'll be playing with that, if you see this scale changing that's what I'm doing.

So now I promised you that I'd show you a couple of things, the push notification to start with.

So what does a push notification look like?

Well I can tell you the push notifications actually right here.

I'm going to scroll that over, so you can see Instruments with the actual Instrument name next to it here.

There we go.

Now this is actually the push notification.

The reason I know this is a push notification is because I looked at our Sleep/Wake Instrument, and this dark band means the device is asleep.

The light band means the device woke up and then it went to sleep again some short time after that.

Now, the Display Brightness Instrument is telling me that the display did not come on for that wake up.

So that's indicative of a push notification or some other type of notification going on.

Now since I sent myself a mail, when you get a mail on the phone you have that friendly mail tone that generates that tells you that you have an email waiting.

Maybe it's not friendly for you to receive too much email.

But if we go to the Activity Instrument we can kind of confirm that.

So if I move my inspection head to this block of activity here that I see, the inspection head will line up.

Let's see if I can get it all on one screen here, maybe not for you guys.

But you might notice that when I move the inspection head it will move to the line in the table view of where that activity is.

So if we look at that line in the activity, we will see that we had 13% activity during that interval.

And sure enough the audio processing went to 4.3%.

So that is our push notification.

Now next thing I said is that I had watched a YouTube video.

Now how do I find that in all of this data?

Well, we have these flags on the top here that record the interesting events.

And this one's coming from the Sleep/Wake Instrument telling me that the device has gone through a running state.

And I can navigate the flags here with these arrows.

We'll see that more, that mail came to the foreground with me checking that mail I sent.

Then it goes into the background.

And ah, YouTube is started.

So now I see that YouTube is the foreground running application.

And I can move to the next one and see that where it goes into the background and then becomes eventually suspended.

So I know in this window of activity is my YouTube activity.

Try to fit all that in here.

Scale out.

OK. So now let's take a look at some of that activity.

If we look at the Activity Instrument again here, we'll notice what I was talking about the workload moving around.

And this is what I wanted to show you guys.

If you look here we have 30% activity while we're playing the video.

The foreground app is actually doing .4%.

That's the YouTube application itself.

It's doing very little work.

Basically just maintaining the controls.

Now the bulk of the work is being done in the audio processing daemon, and that's also the media daemon.

So as it's playing the video and running through the codex that workload occurs in the audio processing.

And we're doing a little bit of work in compositing in the graphic process; the graphics and render server.

So as I move out here, now another interesting thing I wanted to show you about this power trace is you might be able to see from the monitor here that the activity levels been fairly consistent.

But I'm going to move along the energy trace up here at the top.

And if you guys want to take a look at that tool tip that shows the energy level at 13 out of 20.

As I move long here at the beginning of this trace I see it's 13, 14, 13, 14, and we continue to move on.

Now we move towards the end of the trace it starts to go down.

We're actually using less power here.

Now we can speculate that that's due to the way that we watch video on the Internet.

And the way that we do that is we start watching the video while we're buffering.

So as we're buffering the end of that movie, we're watching the video.

So you're going to have double activity during that period, now that activity's mostly occurring over the radios.

And when I reach about halfway point in my movie, the radios are going to stop becoming active and you're going to see that energy come down.

Now the other thing I told you is that we're going to have some GPS activity in this trace.

If I zoom out you'll see on this bottom band, which is the GPS that we do have some sporadic activity happening here.

Now the question is why.

Why is that occurring?

Well, we'll the flags again to do some navigation in here around where the GPS is.

I'll zoom in a bit.

And we'll look at these flags.

And the flag that's closest to the GPS turning on is Maps coming into the foreground.

Now this makes a lot of sense, because the Maps application, at least the way I was using it was doing constant location updating.

So you can see for the entire duration that Maps was in the foreground that GPS was active.

Now there's some GPS activity a little bit later, so let's continue to look through our flags here to find out what might have triggered that.

And we see that the next event that happened closest to the GPS turning on is the camera application came to the foreground.

And for the duration of the camera application as it was recording a little bit of video, the GPS came on and off.

And this is due to the feature in the camera application for geotagging.

It uses Core Location, which uses the GPS to find out your information, so it can associate that with the video.

So this is a full power trace, it's about 18 minutes long.

You want to record traces that have similar length so you can try to get the significant amount of data to see if your, how well you're doing and how your applications are interacting with the device.

So I'm going to go back to slides.

Here we go.

Ah, demo. OK, so now one of the things I want to talk about before I move on is that when you measure power the way we measure power, which is very close to the battery, it's both a blessing and a curse.

It's a blessing because we're getting a measurement that shows where all the power is going.

There's nothing that's sneaking by the sensor.

And it's a curse in a way, because that's everything.

including a lot of things that you don't have direct control over.

So when you want to see how your application is specifically impacting the device, you may want to disable as many of the features as you can on the device.

And a few recommendations here are mail fetching.

If you don't want the activity for mail fetching to occur.

Push notifications if you're application's not using those.

And the auto-dimming on the display.

The ambient light sensor inside the device will vary the power being sent to the display, which could widely affect your measurements.

Especially if you're in an office that's changing lighting conditions.

Furthermore, the radios can be disabled.

Wi-Fi, as I mentioned can be disabled.

But also there are settings in the phone where you can turn off the 3G and data connection.

If you want to test the affect that the data's having on your power usage.

And also you can disable the Bluetooth.

Now a smaller point here, but one I felt was still worth mentioning, is that temperature does affect energy usage.

As the device warms up your device is going to, the resistance's internally electronics is going to increase.

It's going to require more power to drive the device.

Up to a certain point, of course, where the device becomes too warm and it start throttling its power consumption back to try to bring that temperature down into a level that's comfortable for the user.

Now unless you're really driving the device hard you're not going to see this.

But if you do see some difficulty; if you have some difficulty explaining some of the measurements that you got, you may want to consider the affect that temperature has on the device.

So in summary, we have a new set of power analysis tools in Instruments.

We have a new template, which takes all of those tools, wraps them together in a nice single document for you.

Now the whole goal of this is to help you guys extend the battery life for the end user.

And so after you've done optimization in the Time Profiler and you have a nice crisp responsive display, you might want to push even further and try to optimize your application just a little bit further to try to reduce that CPU activity that could be contributing to a shorter battery life.

Now also what's new here is that we encourage you guys to do the recording in the field.

This is sort of a new mode for Instruments.

In the past you had to have been connected to the device.

But in order to get the most accurate representation how your device uses power, you need to take that device into the field and do your recording there.

And this is new on iPhone SDK 4.

All right.

Thanks. I hope you guys have fun using that.

It's been a lot of fun to develop.

The next we want to talk about something completely different, and that's re-symbolication.

Now if you've taken a trace sometimes you might end up with a series of what we call missing symbols.

If you see inside the trace numeric addresses rather than the symbol names you're familiar with like PrintF and DrawRec, etc. You'll just see these addresses.

And that's basically Instruments telling you that it doesn't have the information, the symbol information, or rather the way to map addresses onto symbol names.

Now the way we used to fix this before the iPhone SDK 4 is we would rebuild the application, reinstall it on the device, and then rerun our test.

Now why did that work?

Well, the symbol information is actually a byproduct of the build process, and it stays on the host with your application binary.

And every time you rebuild, and now of course what I mean stays on the host, it actually does not get installed on the device for testing.

So every time you get a new build, each binary and symbol information has a UUID, or Universal Unique Identifier that shows how to match the two.

And when we get data from the device into Instruments, the Instruments on the phone sends the UUID of the app being profiled back to Instruments on the host.

And then Instruments on the host goes through the process of locating the binary somewhere on your host with your symbol information to try to find it.

Now if you've done a clean, then you've removed the binary from the host and you've removed the symbol information from the host, and you end up with a lot of these missing symbols.

Additionally, if you try to go off and rebuild the binary and you don't reload it or try to take another sample, then Instruments are still not going to be able to find that symbol information, because every time you build you get a new UUID for that particular binary.

Now for some of you out there this still happens, even if you didn't do a clean.

And this could be due to the way that Instruments looked for that symbol information based on the UUID.

It uses Spotlight to do that.

And Spotlight can be tricked in a few different ways.

First if you've moved all of your binaries and symbol information into a compressed achieve, like a zip file.

So obviously you're not going to find it in there.

But to also Spotlight doesn't index certain types of network shares.

So if you use a build server or a symbol server your Instruments is not going to be able to find your symbol information out there on the server.

Also, Spotlight can be intentionally disabled.

You can tell it not to look in certain places, and that may have been where you decided to put your build folder.

And also if you put your build folder in temp, temp is not indexed in Spotlight.

So all of those can contribute to Instruments not finding that information, and then you have a lot of missing symbols in your document.

But, in the iPhone SDK 4 we have a new option under Instruments to take a trace that you recorded and re-symbolicate it.

Symbolication is effectively where we take our symbol, our addresses and map them back to symbols.

If you go into the file menu you see a re-symbolicate document feature.

In there you'll see this panel that drops down.

It's actually one panel, but I'm showing you two tabs on there.

It allows you to locate the dSYM file.

And that's a file with a .dSYM extension on it that usually gets built next to your binary, and associated back with the binary.

So if you save those aside for your different builds that's how you can reassociate them.

Also you can add a dSYM search path.

So if you are using a symbol server or build server, you can put that path in of the network share in here, and Instruments will, again, find and locate your symbol information.

So once you do that you will see the happy, healthy symbolication and symbol information that you're used to.

And you can get back to work solving your problem.

All right, that's all I have to tell you guys today.

But Lean is going to come back up here and talk to you about Xcode 4 preview and some of the neat new features, including system trace.

[ applause ]

Lynne: Thank you Chad.

So earlier in this session we saw how to use Time Profiler to see where your threads were spending most of their time.

But it's also useful for you to know how your threads interact with other threads on the system.

For example, you might want to know whether your threads are getting preempted by threads trying to service an interrupt.

Or whether your threads are blocking, because they're exhibiting lock tension, and therefore serializing when they're not, you don't want them to.

So, new in Instruments is part of Xcode 4 developer preview.

We've introduced system trace.

System trace records all the transitions that your threads make into the kernel.

And they can do this because of three reasons.

First, you do system calls.

Second, virtual memory operations.

And third, thread context switches.

So before we go ahead and explain how Instruments record this data and what it shows you, let me go ahead and explain a couple of terms.

So if your thread is running in time, it is context switched onto a CPU or a core when it's scheduled on that CPU.

And it gets called to switch out when it's scheduled off of the CPU.

The time which it's on the CPU is called a thread tenure, or a thread interval.

Now during time a thread is on a CPU it can actually make a couple of system calls or virtual memory operations as shown here.

Now some of these system calls might be blocking, and cause the thread to get context switched out.

And other virtual memory operations a system calls can actually execute within the duration of the thread tenure.

So, system trace is a template with three Instruments.

The first of these Instruments is the scheduling Instrument.

And it really records when your threads on CPU, which records when the thread tenures are executing.

And it records why your threads get off the CPU, for example, if they get blocked, if they get preempted by another thread, or if they explicitly yield their quanta to another thread.

And the scheduling Instrument records the thread priorities.

So really a good rule of thumb here is you want your threads to run on their respective cores for as long as possible.

And to help you apply certain rules of thumb that you might like to use when using system trace, new Instruments as part of the Xcode 4 developer preview, we've introduced the trace highlights view.

This is a top-level statistics view that lets you understand what's going on with your threads or processes.

In this case, it's showing me the top five processes that are showing the largest amount of context switches.

So it really helps you understand what's going on and apply the rules of thumb that we're going to be talking about here.

So if your application requires a lot of throughputs, it makes sense to maximize the time your threads are on the CPU, so maximize the thread tenures.

And if your app is looking for responsiveness, you want to minimize the amount of time your thread is on CPUs, so that the thread tenures are short, the thread gets on the CPU, does the work it needs to do, and then gets off to allow for other threads to actually execute.

The second Instrument is the system calls Instruments.

And the System Calls Instrument records all the requests that you make into the kernel so that it does work on your behalf and accesses resources.

So because system Calls access resources, they can actually incur a context switch they can block.

And this might cause unwanted serialization of your otherwise parallel code.

So the rule of thumb is to minimize the number of system calls that you make and to avoid using slow blocking system calls that you might know about.

And once again, you can use the trace highlights view in order to find and see the top statistics for your applications and threads that are making system calls.

The third Instrument is a Virtual Memory Operations Instrument.

And it records when your code accesses new pages of memory.

This can happen in three ways.

Your code can, or Instruments can record when your code first accesses a new page of memory when the memory manager allocates a new page in your code's address phase.

Or if your application is using file maps resident pages, Instruments is going to record when the memory manager makes a copy of a memory page into your processes' address space.

And Instruments will also record page-ins and page-outs from disks, which are the most expensive memory operations.

So those results, you know, you want to try to minimize the amount of paging that your application does.

And to do that generally minimizing the total memory footprint of the applications actually reduces the number of virtual memory faults that it uses due to paging.

You might also want to reuse your memory allocations.

And if your applications are time sensitive, it's good to also preallocate memory and front load all of the memory faults to the initialization stage of your application, as opposed to during the applications execution, which might be costly.

Right, with that let's go to a quick demo system trace.

I'm going to bring up a demo machine over here.

And hit command New to bring up Instruments again.

I'm going to select the system trace template over here.

Double-click it.

And the first thing it does is it adds 3 Instruments onto my trace: scheduling, system calls, and VM operations.

And by default its targeted to profile all processes.

So for this demo I'm going to be using mipmaps again.

And I'm going to be comparing 2 different algorithms for creating the mipmap thumbnails.

And one of these is, takes, well actually let me start recording while I'm talking and explain this as I'm recording.

So what I'm going to do is I'm going to hit Record and Instruments should pull up the deferred mode.

I'm going to start recording.

So the first and faster method of creating these mipmaps is to have a single thread go through the largest to smallest scales of my images, and create the thumbnails and the mipmaps.

So I just hit Stop Recording.

The second method, which I'm going to perform now if I go to Mipmaps Preferences and change this to smallest to largest, is that I'm going to have the threads that I'm using go through all the smallest mipmaps and create those first, and then the next, the largest up, and so forth.

So back in Instruments, I'm going to hit Start Recording again.

And Start over here.

So this second algorithm we notice is slightly slower.

It's going to take more than 5 seconds to complete.

So I'm going to use system trace to find out why this is the case.

All right, so we're done, hit Stop Recording.

First thing we see here is the highlights view.

And the highlights view is some of the information that, you know, is pretty useful.

It's showing me that on my system most of the context switches are happening in mipmap, which I expect, and as I scroll down my system call wait times are attributed to a lot of file and directory operations, like Open, Close, No Cancel.

And a lot of memory faults are happening on behalf of mipmap.

And my process total system calls, a lot of them are happening in mipmap as well.

When you zoom out and take a look at the system call wait time.

I'm going to click on this and it'll take me to my System Calls Instruments.

And if we zoom in we see in the second algorithm we're spending around 230 seconds waiting in our threads, as opposed to 181 milliseconds executing on the CPU.

So we go back to original run and take a look at the first one.

We can see that the first algorithm is actually waiting a less amount of time on, at the threads, so the threads are blocking less.

It's 120 milliseconds waiting, and 150 milliseconds executing on the CPU time.

And we're actually making a lower number of system calls in this case.

So this is an example of how to use system trace, understand why your threads are behaving the way they are, how, and compare algorithms as we did in this case.

So let's switch back to the slides.

So, in closing, what did you learn today?

We learned how to use Time Profiler and Data Mining to find out precisely where in your code you're spending most of your time and resources.

We've also introduced some new Instrumentation that allows you to understand the power characteristics of your iPhone apps and to understand how your application and interacts with the system as a whole.

And I would like to point out that if you would need more information and have any questions, please feel free to contact Michael Jurewitz our Developer Tools Evangelist whose email is shown on this slide.

So I highly recommend attending our sister session, which is in about an hour or so in Presidio.

It's called Advanced Memory Analysis with Instruments, which we'll talk about memory managements and how to use Instruments to find problems with that.

And also, you should definitely attend the Performance Optimization on iPhone talks, and there's two of those, today and tomorrow.

And thank you very much for attending our talk.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US