Profiling in Depth

Session 412 WWDC 2015

Learn about time profiling down to the disassembly level to help you investigate the minute details of your application that affect its performance and responsiveness.


CHAD WOOLF: Thank you.

Thank you.

I'm Chad Woolf.

KRIS MARKEL: I'm Kris Markel.

CHAD WOOLF: We are performance tools engineers for Apple.

This is session 412.

We'll talking about profiling in depth.

This will talk about the time profiler in Instruments and how to use it to optimize your applications.

Time profiler is place to go when you're trying to figure out where your application is spending the bulk of its time.

An example.

It is useful when trying to discover what the application is doing at runtime.

You want to see who is calling what.

Our session breaks down a bit like this.

We'll talk a bit about the motivation on why we wanted to create a session devoted to the time profiler only.

But the session will revolve around demonstrations and showing you details on how it works and how your applications work below the source level.

Then we'll give you final tips on how to use the time profiler on your own.

Quickly about our motivation, it comes from the creation of Instruments 7 itself.

With Instrument 7 we wanted to go with a new look and feel which meant new artwork but it also meant that for the new feel, we wanted it to be more responsive and we wanted to feel smoother.

We'll do performance optimizations for our UI in general.

We also wanted to try new graphing styles.

These graphing styles were things that we wanted to do in the past but didn't have the performance to in our existing graphic code.

We knew that we would have to focus on the rendering which is a pretty difficult piece of application.

Instruments have to deal with hundreds of thousands, sometimes millions of data points and has to try to crunch that down into a presentation that's both simple and easy to understand.

There is definitely algorithm complexities in there.

What that meant for us, we had to rewrite a significant portion of our application which is the track view which lives up at the top of the app.

Chris and I over winter took our design for the new track view and we started building it up out of a series of prototypes.

We didn't do the prototyping in Instruments itself, we broke it out in a separate application to keep it simple.

This is what one of the last prototypes looked like.

Now while we were prototyping, a thing we did, we set a performance budget.

As we layered on feature after feature we were constantly evaluating our performance of our code against the budget.

When we found we were exceeding the budget we would use the time profiler to figure out where in our application things were going wrong.

Sometimes it was an easy fix but sometimes it wasn't.

Since we were prototyping, even some of the bigger structural changes we had to make to get the performance back on track was still fairly easy.

When we integrated it back into Instruments we use the time profiler again to find the hot spots in our integration points, and after a few iterations, we had a version of Instruments 7 which was meeting our performance goals.

We were thrilled with how the time profiler got us through this winter, that when WWDC 2015 came, we wanted to create a session that talks about time profiler and the problems that it's really good at solving.

We wanted to share our experience when writing the track view.

What we did this year, we created a demonstration application on iOS that looks similar to the first track view prototypes and we set performance targets for ourselves, 100,000 data points is what we wanted to graph, we wanted a perfect 60 frame per second for panning and zooming.

We wanted to make it work on a iPad mini 1st generation.

We picked the iPad mini 1st gen you know what I'm talking about we knew it would work well on all of the other platforms especially the later platforms.

Chris will show you the application, he'll time profile it, and he's going to show you the things that we found when putting this together for you.

KRIS MARKEL: Thank you, Chad.

Here I have the prototype application of an Xcode, and I want to call out a few things.

Our initial implementation we discovered can't handle even close to 100,000 data points.

Initially we're working with 10,000 data points to get going.

Another important point, you should do your time profiling on release builds, you want to take advantage of the compiler optimizations while you're profiling.

I'll go ahead, start profiling the application and to do that I'll, from the product menu, choose profile.

This will build the application, install it on the iPad and bring up the Instrument template user.

Here it is, time profiler is selected for me.

I'll click choose.

Here you will see Instruments in our new track view and we'll get back to that in a minute.

Right now I'll go ahead and start recording, click the record button, we'll show you the app.

I want to emphasize what you see here, this is not the simulator, this is QuickTime mirroring that which is already on the app.

Here's our graph, I'll go ahead and scroll, scrolling isn't bad, it is not bad, I'll zoom out by pinching, it is okay at fist and then starts to stutter, stutter, stutter, that's pretty bad.

Now, finally, I'm going to just scroll back and forth with my finger, I'm actually moving my finger but the display is not updating, it is very, very laggy.

That's very poor performance.

Let's look at what's going on.

Let's go back to Xcode Instruments.

We'll stop the profiler and quickly show you the new track view.

Here we have the CPU usage.

What this is, this is the average CPU usage for specific unit of time.

That time depends on your current zoom level.

You see the different parts as I use my app where it was spending time.

This is scrolling, the zooming out, this is the scrolling back and forth while zoomed out.

A nice thing about the new track view is that I can go ahead and use this pinch gesture to zoom in on a piece of data that I'm interested in.

If you're not using a Trackpad, you can hold down the option key and scroll up and down to zoom in and out.

I want to focus on this particular piece of data, the activity right here.

I was scrolling here.

To do that, I'll apply a filter, that's just a click and drag and it selecting just those specific samples so I can focus on that particular data.

I'll create more space down here.

Down here this is our detailed view.

What this is showing us is how many the percentage of time profiler samples we have inside of a particular function or method and then we have the symbol name.

Here is our percentage and here is our symbols.

Now the first thing you usually do when you time profile is you start to expand these out and looking for kind of a comparing the numbers over here with specific methods over here, function.

Looking for things that kind of stand out as, you know, jump out at you as something that's worthy of investigating.

Another option, if we go over to the inspector pane and click on the extended detail, we see the heaviest stack trace.

This is for the main thread.

This is where focusing here is where I'll get the most bang for my buck when doing a time profiling trying to make performance improvements.

Let's see what's going on.

The main is calling application main, run loop, there is core animation work happening, some there is nothing standing out as something that seems out of the ordinary.

In fact, this is a really common occurrence when profiling.

You look at what the application is doing the most, well it doesn't look like it is doing anything special.

There is nothing no call to compute the 40th Fibonacci number here in here or something.

However, you know, looking at this call stack, this stack trace, there is something I know what my application does.

It is a simple prototype app, what it does, it builds a path and it draws a path.

I can actually see in here that there is a call, a CG context path.

It is not called by my code according to the stack trace.

It is here, it is taking up a fair amount of time.

I'll go ahead, click on that, take a look at that.

If we look back at our call tree I do see something interesting here.

What we have, we have the draw path according to this call tree is called by this draw layer method on UI view.

That's also calling our drawRect for the graph view.

That's taking a fair amount of time.

That's one of the things that the app does.

If I look at the time over here, context draw path is taking up, you know, 55% of the samples but the drawRect, it in very few.

This is interesting, if I double-click on the drawRect method it will take me to the source code.

I see, if you look to the bottom, I actually do call draw path from the drawRect method but it is not showing up in any sample.

Everything in here is an add path.

This is unusual especially since my expectation is my own drawRect method would take a while to run.

It is basically half of what my app does.

Looking at this, I actually notice that while drawRect returns a void and context draw path is the last method called and this is probably a case of what's called Tail Call Elimination.

To tell us more about Tail Call Elimination and how to verify that, back to Chad.


So to explain the situation that Kris is seeing we have to understand how the time profiler sees what's calling what inside of your application.

This is going to get technical, I'll walk through it step by step.

On the left, we have the code for drawRect and on the right, we have what you would imagine the stack to look like for that thread, right before the UIKit calls into the drawRect.

When that call is made to the drawRect it will do what most functions and methods do, to establish their own call frame.

That starts off by first pushing the return address from the link register and the previous value of the frame Pointer on the stack.

Now drawRect knows how to return to its caller and restore the frame Pointer.

Now the next thing that happens, we take the frame Pointer set up to the new base.

Then drawRect will make room for its local variables and the compiler scratch space and that's now we have a frame for drawRect.

Now the code starts to run and we come down to draw path and draw path does the same thing.

It pushes the own frame on to the stack.

The way time profiler works, it uses a service in the kernel that will sample what the CPUs are doing at about 1000x per second.

In this case, if we take a sample, we see that we're running inside of context draw path.

Then the kernal looks at the frame Pointer register to see where the base of that function's frame is and find the return address of who called it.

Now we can see that drawRect called into draw path.

If we want to see who called into drawRect we can use the frame Pointers that were pushed on the stack to find the base of drawRect and then continuously go back through the stack until we hit the bottom.

This is a backtrace.

If we take enough of these and put them in the call tree view you can get a pretty good picture of what's going on inside of your application.

I want to point out, the frame Pointers on the stack are absolutely required.

If you're compiling code with fomit-frame-pointer turn that off to try to do the time type of profiling that we're doing here.

Let's look at the optimize cases.

This is where drawRect was compiled with compiler optimizations enabled.

Same situation, we have a drawRect frame.

We're about to call into draw path.

You notice when draw path returns drawRect is finished.

Doesn't need to do anything.

It is going to return.

The way it returns, it is it pops the stack frame, restores the previous value of the frame Pointer and jumps back to the caller.

The compiler looks at this and says well, why does draw path need anything from drawRect's stack frame.

It doesn't.

Furthermore, why come in to drawRect back to drawRect at all when it is going to return to its caller?

What it does, it rearranges the code like this.

It is going to pop the stack frame, restore the frame Pointer and make a direct call back into draw path meaning that we don't need to jump back to the caller.

That's harder to explain than it is to see.

Let's imagine what it would look like when running this code.

We'll pop the stack frame off to get rid of the local variables.

We'll restore the frame Pointer to the original value and the value of the link register.

Then we'll jump to the beginning of the code for draw path.

Draw path is going to push its own frame back on to the stack using the value that it found in the link register and the frame Pointer.

From draw path's perspective it was called directly from draw layer in context from UIKit.

If we take a time sample at this point, we'll see the exact same story.

Even though that's not the actual call sequence that happened, this is what the time profiler saw.

That's what we ended up seeing in our call trees.

This is called Tail Call Elimination, it is common in highly optimized code and has some benefits.

It saves stack memory.

In the process of saving stack memory, it keeps the caches hot, that reuses the memory, the caches and data.

It has a profound effect on recursive code, especially tail call recursive code, where a function or method calls itself as the last thing and then returns.

Without pushing those frames a Tail Call Elimination inside of a recursive function can make the performance as good as its iterative version, so you don't get the stack growth and you get the high performance as well.

This optimization leave on with highly recursive code.

If you want to turn it off for the sake of profiling to show a cleaner stack trace you can turn it off by going in the build settings of the project and setting the compiler flag, the CFLAGS to FNO-optimize-sibling-calls, turning off the optimization, and unfortunately the performance gains with it, but you'll get a better result in the time profiler.

If you choose to live with it, and you want to identify if a tail call is happening, then what you can do, you can look at the disassembly and call sight of the last call.

If it is a normal call, it is going to use a branch and link family of instructions, that's the first example there.

What that does, it jumps to the new function and saves the return value in the link register.

If it is a tail call case and we're going to jump directly into the new function, it will be a direct branch instruction.

Without the BL.

That could be a call instruction for a branch and link and the branch would be a jump instruction, if you see that, it looks similar.

Now it is up to Kris if he wants to disable the optimization and recompile, or he can carry on, it is your choice.

KRIS MARKEL: I'll look at the disassembly.

In Instruments, upper right-hand corner of the detailed view, there's a button, view disassembly, if I click that, I see the disassembly for the method and we can confirm the call to context add path is a branch and link, the call to context draw path, it is just a simple branch.

I'm confident that this is a case of Tail Call Elimination that 55% that I saw on the call tree that was not attributed to my drawRect should be attributed to the drawRect.

That is good news.

I know now my drawRect is on my heaviest deck frame, my heaviest stack trace and it is consuming 55 to 60% of my time.

This is great.

I know where to optimize.

I optimize drawRect, I'm good to go.

Let's look at this drawRect.

Looking that the drawRect, if I had a table I would flip it.

There is not much to optimize here.

It is hard to think of a simpler drawRect that actually is functional.

We have 4 function calls, context, you know, CG calls, this drawRect really does not do much.

It turns out that this is actually a really common occurrence when doing profiling.

You will take a look at your hot spots and code, there is not much you can do in your code directly to improve your performance.

You know, this junction, what do you do?

You know, other than flipping tables, crying into your pillow at night, what we did was we went through, started to look at the core graphics documentation and other drawing, you know, Cocoa drawing documentation.

We came across this particular property here.

This drawsAsynchrously.

Lo and behold, there was a make my code faster button that was created by an Apple engineer.

This is excellent.

Above that, you see I copied out of the documentation, pasted it in there.

It says a couple of things that are interesting.

It says, first of all, it may improve performance, it may not.

You should always measure.

You know, okay, dad.

Let's measure.

Let's see if this improves in performance.

This time to start the Instruments, I'm just going to do command-I for Instruments.

It will do the same thing.

It will build the application and install on the device, bring up the template chooser.

It takes a moment.

Two moments.

Three moments.

Here we go.

Another shortcut I like to use, if you take a look at the choose button down here.

If I hold down the option button it changes to profile.

What it means, when I click this, the application is going to start recording.

It saves me a step or two.

I'll do that now.

Now the time profiler will come up.

This is measuring the app.

I'll do some quick scrolling back and forth, capturing some data.

I think that's enough.

Let's go ahead, stop the recording.

I'm going to filter to specifically the scrolling data.

If we go ahead down here looking at the detailed view.

This is promising.

I'm actually you can see, there is multiple threads here.

The threads are doing work.

That's really good.

If we go ahead, if I hold down option, click the disclosure triangle I can see what the thread is calling, there is some dispatch calls, some CG calls, that's good.

That's the drawing code.

We'll go ahead, check the other one to see.

Hold down the option key.

Dispatch, CG calls.

This is good.

This is looking promising.

I'm multithreaded, in theory my app is faster.

However, multithreading does not necessarily mean faster.

We should confirm this is actually doing anything for us.

One way to do that, I happen to know this device is two CPUs, if the CPUs operate in parallel at max capacity I should see a 200% CPU usage in my graph up here.

I'm not seeing anything over 100%, that's a bit of a warning sign, it doesn't necessarily mean that they're not both doing work at the same time.

It means that I need to check further.

How do we check further?

Instruments has what we call strategies, it is different ways of partitioning the data to look at them.

There is three of them.

The first one is the Instrument strategies, the default, we're looking at it here.

The second one is the CPU strategy, it shows the data per CPU or CPU relevant data and the final one is the thread strategy.

It shows you details on what each thread is doing.

Let's look at the CPU strategy.

We can see we have each of the CPUs, we can see how much work it is doing.

At the bottom, we see the combined usage.

A nice thing to do here, when I zoom in far enough, the details, what the graph shows me, it will actually change.

Rather than show making the average usage, it will show if the CPU is active or not, it will go from average usage, to either on or off.

Now each CPU shows an on state or off state, whether or not it is working.

What you notice here, the CPUs are never working at the same time, there is no parallelism here.

You know, this is not good.

If we want to feel worse, we look at the thread strategy.

What this is showing us, each icon, it represents a sample the time profiler took, you can click on it.

See the call stack.

Here this is on a background thread and you see the core graphics calls, here is the main thread, you see the main the work we're doing on the main thread.

As you see, if I zoom in to the right level, it is probably here, I scroll around, you see there is not really anywhere where two threads are working at the same time.

It is jumping from one thread to another.

So the drawsAsynchronously, this has not really done anything for us, in theory, it may have slowed us down.

Now not only we doing the same drawing work but also managing, you know, core graphics system managed the threads it is working on, that sort of thing.

That didn't really help.

I'll turn it off.

I'll flip another table I guess.

It is not clear, the magic button didn't help.

What do we do now?

This is a really common occurrence again in time profiling.

You try lots of things, most don't work.

We step back.

What does the app do?

It builds a path and draws a path.

We have seen the draw path code.

Let's think about the build path code.

That's right in here.

What we wanted to do, we wanted to investigate the actual path we're building.

What this code does, it loops the data elements, creating a path and adds a line to that path for each data element.

We want to know how many lines we're adding to the path.

This is something that the time profiler can't tell us.

It can't tell you how long, or how many times a particular method or function has been called.

It doesn't know the difference between a slow function that's called a few times or a fast function that's called a lot.

In this case we resorted to NSLog and we just have a thing, every time we add a path we increment our counter and then we log it when we're done with the loop.

Something important to point out, NSLog, it is not a very fast function.

You don't want it in high performance code.

You probably don't want to use it for anything other than gathering diagnostic information or debugging.

When you're done, delete it from the code.

In this case we just comment it out so you can see it.

What we have found, we're adding 10,000 lines to the point in cases where we did not need to do that.

In fact, there is no way to display 10,000 lines on this device.

Especially when you're zoomed out far enough that all the data fits within 100 screen points.

There is no reason to draw 10,000 lines in there.

We need to draw 100 lines.

It is a lot less work.

We went ahead, we created an implementation that did that.

If multiple data elements, data points are within a single screen point it just finds the max and draws a single line.

If we're using 100 screen points we're creating 100 screen lines.

We'll go ahead and switch to that implementation.

. . We're feeling good about that.

We'll change the element count up to the goal of 100,000 rather than 10,000.

We'll see if that helped us at all.

I'll use command-I to start up the Instruments.

Since Instruments is already open, it is just going to bring it to the foreground and start recording immediately.

Here we go.

A new recording.

We'll go ahead and scroll, the scrolling seems fine.

I'll zoom out.

Zooming performance is much, much better.

It takes longer because I have more data to zoom out.

It is actually looking pretty good.

I'm going to do the swiping back and forth.

It is tracking my finger really well now.

It is actually keeping up with it, doing a fantastic job.

Hooray! All done!

Except not quite.

If we look at the actual amount of CPU we're using while scrolling back and forth, we can see, you know, sometimes it is down to 60%, it is usually in the 70s or 80s.

Technically we're meeting our performance goals.

What are we doing what's the next thing to do with this app or prototype?

We'll add additional features.

We know that we need more headroom than what we have here.

How do we make it faster, how do we make the app better and meet performance goals?

We'll focus on this.

We'll filter to that data.

I'll give myself a little room.

In this case, I'm going to hold down the option key and click main and expand this out.

I can actually go down here and I can see this method here.

You know, now that the drawing of the paths is fast enough, it is the building of the path that becomes the bottleneck.

I want to focus on this method.

I will click the focus button.

That just moves aside everything outside of the method and normalizes our percentages to within this method.

This method is spending 55% of time in the get next element and 10, 11% of time in objc msgSend.

Something that I know, objc msgSend, it is a super fast method.

It is super optimized.

But, if I can get that 10% back, I want it.

If we look inside of our code here we can see it is actually very clear.

Most of our time is spent on getting the next element.

This percentage here, it is a bit higher than in the tree view because it includes the objc msgSend time.

If I get rid of that and make this iterator faster, I hopefully can get the performance boost that I want.

To give us ideas on how much to do that, it is back to Chad.

CHAD WOOLF: Let's talk about objc msgSend a bit.

It implicitly gets inserted by the compiler whenever you use the square bracket notation or whenever you use the dot notation to access a property on an object.

Its purpose is to look up the method implementation for the selector and invoke that method.

That's a long way of saying that's how we do dynamic dispatch in Objective-C.

Objc msgSend is extremely fast and does not push a stack frame.

When you look at your time profiles, you typically won't see its effect.

The times that you do see it would be a perfect example like we have where we see it in our iterator.

What we're doing, we're iterating over 100,000 points and calling it the get next method with a small method body.

Just increments a couple values and returns a structure.

What's happening, all of that overhead time from the Objective-C message send is accumulating into something that's measurable.

Is there a way to get around that overhead?

Not exactly.

Objective-C by design is a dynamic language, you have to make the objc msgSend call when accessing methods for objects and classes.

It does this every time because you can switch out method implementations at runtime.

There is no compile time in Objective-C saying I want to call in this particular method body.

The only exception here, if you do what's called method caching, where you look up the method implementation yourself and you call it through its function Pointer.

In general I don't recommend that you do that.

It is fragile as you can imagine.

In general, in my experience it hasn't given me the kind of performance wins I'm expecting because you got to think about why we're here in the first place.

The reason we're here, the get next element method has that small method body.

Even if you invoke it through the function Pointer you still have to have to marshal the arguments and push the frames on to the stack and pop them when you're done.

That's exactly what you saw in the previous set of slides, that can be a lot of overhead with an increment and then return.

I want to make sure I point it out that method caching is not as fast as inlining, what we really want in this case is that small method body to be inlined.

How do you that in Objective-C?

Well, you have alternatives.

The first one, you could have used C and you could have used structures instead of an iterator, you could pass a C ray into the method for example.

If you want that OO flavor you can use C++.

The way you use C++ in Objective-C, is you rename the file from a .m to a .mm and then you can use C++ syntax.

Because Arc is usually on by default, then you take Objective-C objects and put them in STL containers, and put them in instance variables on your classes and structures.

This is handy and you get the performance benefits of C++ and I have used it extensively in Instruments to get as much speed as I could out of the track view and other critical areas of Instruments.

I know from firsthand experience also there is a major downside to this.

That is that you have to know ahead of time which parts of your code are going to benefit from C++ and which codes will benefit from Objective-C?

Sometimes, until you're doing profiling, you can make a mistake there as we did in the demo case.

We wrote our iterator in Objective-C not realizing it would show up in our time profiles.

Is there a better alternative than what I just mentioned here?

There is. You knew it was coming.

Swift is ideal, because unlike Objective-C it is only dynamic when it notes to be.

If you make sure that the performance critical classes are internal and you use whole module optimization, the compiler or the whole tool chain can determine when there's only one method implementation and inlines it right into the call site giving you some significant wins especially on the iterator case.

Because we're prototyping, rewriting the iterator in the View Controller in Swift is easy.

Kris has done that.

KRIS MARKEL: I have a Swift implementation ready to go, this is an easy port of the Objective-C implementation with a couple of the suggestions they made in this morning's session about improving Swift code, specifically turning on whole module optimization.

Let's profile this.


It will build.

Install to the device.

It should just start profiling.

Okay. I'm going to bring the application forward so you can see.

Here is the scrolling.

Looking good.

Zooming out.

There you go.

Zooming out.

Staying nice and fast.

A lot of data to zoom out of.

Now, if I go ahead, move back and forth, it moves really fast.


KRIS MARKEL: It is very nice.

Thanks. Actually we can go ahead in here and look at what the CPU usage is.

You know, we've got, actually further improvement than what we expected, we're expecting 5, 6% improvement from removing the objc msgSend, this is lower and I can actually if I turn down this disclosure triangle you see the two runs next to each other and you see the previous run in the lower the current run, it is clearly much lower.

Actually if I go ahead, I go in here, I look for my build path method I have to search for it now, oh, that's not how you do a search.

If I hit command-F, it will bring up this dialogue here.

I can type in build path and it shows me my method here.

If we look at this, you see here is my Swift code.

My get next call which is right here, it isn't showing up in any samples.

You know, no samples landed on this.

Why? It is because Swift was able to inline it, whip means that there's no function overhead, no method call overhead whatsoever.

Because the code for the iterator is inline with the rest of the code it makes further improvements explaining the better performance than what we hoped for by skipping the dynamic dispatch.

Chad, anything else to tell these nice people?

CHAD WOOLF: We have 5 minutes!

Of course I do.

Some tips for you to explore Instruments on your own.

The first one to point out, under the recording settings, it's called record waiting threads.

I mentioned that the service and kernel we use samples the active CPUs but if you have threads sitting around, blocked on a lock or waiting for I/O, you check this checkbox, and the service will sample the idle threads as well.

If you have code that's contending over a lock you see the hot pods show up when you enable record waiting threads.

Another thing that I find interesting, in the display settings, in the call tree section, it is invert call tree.

Figuratively what it does, it flips the call tree upside down.

Instead of seeing the leafs at the bottom nodes of the tree, that's functions that don't call into anything, they appear at the top.

If a utility function is being called from 5, 6 places, you invert that call tree to see who is actually calling into that particular function.

It gives you a different perspective on the data in the call tree.

When you right click a node in the call tree, you get a context menu and there is interesting stuff under there as well.

One thing that I use from time to time is the charge to caller, so what you can do, you can charge a function on method to the caller.

You can charge an entire Library of Framework to the caller.

There's an option in there as well to prune a subtree.

You don't want to work on a specific problem at the time, you can prune that out of the data and focus on what you want to focus on.

What did we learn?

Through all of this, the first things I want to remind you about, is to incorporate performance targets early.

If you're doing a big performance rewrite like we were, start with a budget and keep monitoring it because once you start layering a lot of code on top of that, it is harder to change.

Secondly, always measure.

Through all of our demonstrations we were taking time profiles with the time profiler, we used that data to find the hot spots and retuning it, by the end we had a well-performing application.

If you go in blind, depends, you may be lucky, hit it.

I would start with a measurement and go after that first.

In the third one, this is most important to me, I want to encourage you to keep digging.

Some of these performance problems you look at at first may have seemed impassable, you say that's happening in someone else's code or a side effect of the runtime.

The reason we gave you the details on the time profiler, the runtimer, what the disassembly view looks like, is we wanted to show you there is a whole world out there of more details.

By using that, you can solve the performance problems that we were solving today.

I encourage you to keep digging, and with creativity, going out there into sessions like this, I know you guys will be able to fix and hit the performance targets you want.

That just about does it for today.

Stefan Lesser is our dev tools evangelist, contact him if you have any questions.

We related sessions, Energy debugging issues, turns out that if you have efficient code on the CPU it uses less energy.

There was that session on Wednesday.

Also tomorrow, there is performance on iOS and Watch OS, good news, the time profiler can target apps on the watch.

That's a big boon.

Tomorrow we'll be in labs from 9:00 to 2:00.

Enjoy the rest of your conference.


Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US