Optimizing Swift Performance

Session 409 WWDC 2015

Hear from the experts about how you can write faster Swift code and use Instruments to identify performance bottlenecks. Dive deep into specific techniques that will help you produce the most efficient code possible.

[Applause]

Good morning, and welcome to Optimizing Swift Performance.

My name is Nadav, and together with my colleagues, Michael and Joe, I am going to show you how to optimize your Swift programs.

Now, we, the engineers on the Compiler Team, are passionate about making code run fast.

We believe that you can build amazing things when your apps are highly optimized.

And if you feel the same way, then this talk is for you.

Today I'll start by telling you about some of the new compiler optimizations that we have added over the last year.

Later, Michael will describe the underlying implementation of Swift and give you some advice on writing high-performance Swift code.

And finally, Joe will demonstrate how to use instruments to identify and analyze performance bottlenecks in your Swift code.

So Swift is a flexible and safe programming language with lots of great features, like closures and protocols and generics and, of course, automatic reference counting.

Now, some of you may associate these features with slowness because the program has to do more work to implement these high-level features.

But Swift is a very fast programming language that's compiled to highly optimized native code.

So how did we make Swift fast?

Well, we made Swift fast by implementing compiler optimizations that target all of these high-level features.

These compiler optimizations make sure that the overhead of the high-level features is minimal.

Now, we have lots of compiler optimizations, and we don't have enough time to go over all of them, so I decided to bring you one example of one compiler optimization.

This optimization is called bounds checks elimination.

On the screen, you can see a very simple loop.

This loop encrypts the content of the array by X-raying all the elements in the array with the number 13.

It's not a very good encryption.

The reading and writing outside of the bounds of the array is a serious bug and can also have security implications, and Swift is protecting you by adding a little bit of code that checks that you don't read or write outside of the bounds of the array.

Now, the problem is that this check slows your code down.

Another problem is that it blocks other optimizations.

For example, we cannot vectorize this code with this check in place.

So we've implemented a compiler optimization for hoisting this check outside of the loop, making the cost of the check negligible, because instead of checking on each iteration of the loop that we are hitting inside the bounds of the array, we are only checking once when we enter the array.

So this is a very powerful optimization that makes numeric code run faster.

Okay. So this was one example of one optimization, and we have lots of optimizations.

And we know that these optimizations work and that they are very effective because we are tracking hundreds of programs and benchmarks, and over the last year, we noticed that these programs became significantly faster.

Every time we added a new optimization, every time we made an improvement to existing optimizations, we noticed that these programs became faster.

Now, it's not going to be very interesting for you to see all of these programs, so I decided to bring you five programs.

The programs that you see on the screen behind me right now are programs from multiple domains.

One is an object-oriented program.

Another one is numeric.

Another one is functional.

And I believe that these programs represent the kind of code that users write today in Swift.

And as you can see, over the last year, these programs became significantly faster, between two to eight times faster, which is great.

Now, these programs are optimized in release mode.

But I know that you also care about the performance of unoptimized programs because you are spending a lot of time writing your code and debugging it and running it in simulator, so you care about the performance of unoptimized code.

So, these are the same five programs, this time in debug mode.

They are unoptimized.

So you are probably asking yourself, wait, how can improvements to the optimizer improve the performance of unoptimized code.

Right? Well, we made unoptimized code run faster by doing two things.

First of all, we improved the Swift runtime component.

The runtime is responsible for allocating memory, accessing metadata, things like that.

So we optimized that.

And the second thing that we did is that now we are able to optimize the Swift Standard Library better.

The Standard Library is the component that has the implementation of array and dictionary and set.

So by optimizing the Standard Library better, we are able to accelerate the performance of unoptimized programs.

We know that over the last year, the performance of both optimized and unoptimized programs became significantly better.

But to get the full picture, I want to show you a comparison to Objective-C.

So on the screen you can see two very well-known benchmarks.

It's Richards and DeltaBlue, both written in object-oriented style.

And on these benchmarks, Swift is a lot faster than Objective-C.

At this point in the talk, I am not going to tell you why Swift is faster than Objective-C, but I promise you that we will get back to this slide and we will talk about why Swift is faster.

Okay. Now I am going to talk about something different.

I want to talk about a new compiler optimization mode that's called "Whole Module Optimization" that can make your programs run significantly faster.

But before I do that, I would like to talk about the way Xcode compiles files.

So Xcode compiles your files individually.

And this is a good idea because it can compile many files in parallel on multiple cores in your machine.

That's good.

It can also recompile only files that need to be updated.

So that's good.

But the problem is that the optimizer is limited to the scope of one file.

With Whole Module Optimization, the compiler is able to optimize the entire module at once, which is great because it can analyze everything and make aggressive optimizations.

Now, naturally, Whole Module Optimization builds take longer.

But the generated binaries usually run faster.

In Swift 2, we made two major improvements to Whole Module Optimizations.

So first, we added new optimizations that rely on Whole Module Optimization mode.

So your programs are likely to run faster.

And second, we were able to parallelize some parts of the compilation pipeline.

So compiling projects in Whole Module Optimization mode should take less time.

On the screen behind me, you can see two programs that became significantly faster with Whole Module Optimization because the compiler was able to make better decisions, it was able to analyze the entire module and make more aggressive optimizations with the information that it had.

In Xcode 7, we've made some changes to the optimization level menu, and now Whole Module Optimization is one of the options that you can select.

And I encourage you to try Whole Module Optimization on your programs.

At this point, I would like to invite Michael on stage to tell you about the underlying implementation of Swift and give you some advice on writing high-performance Swift code.

Thank you.

[Applause]

MICHAEL GOTTESMAN: Thanks, Nadav.

Today I would like to speak to you about three different aspects of the Swift programming language and their performance characteristics.

For each I will give specific techniques that you can use to improve the performance of your app today.

Let's begin by talking about reference counting.

In general, the compiler can eliminate most reference counting overhead without any help.

But sometimes you may still find slowdowns in your code due to reference counting overhead.

Today I'm going to present two techniques that you can use to reduce or even eliminate this overhead.

Let's begin by looking at the basics of reference counting by looking at how reference counting and classes go together.

So here I have a block of code.

It consists of a class C, a function foo that takes in an optional C, and a couple of variable definitions.

Let's walk through the code's execution line by line.

First we begin by allocating new instance of class C and assign it to the variable X.

Notice how at the top of the class instance, there is a box with the number 1 in it.

This represents the reference count of the class instance.

Of course, it's 1 because there's only one reference to the class instance currently, namely x.

Then we assign x to the variable y.

This creates a new reference to the class instance, causing us to increment the reference count of the class instance, giving us a reference count of 2.

Then we pass off y to foo, but we don't actually pass off y itself.

Instead, we create a temporary C, and then we assign y to C.

This then acts as a third reference to the class instance, which then causes us to increment the reference count of the class instance once more.

Then when foo exits, C is destroyed, which then causes us to decrement the reference count of the class instance, bringing us to a reference count of 2.

Then finally, we assign nil to y and nil to x, bringing the reference count of our class instance to 0, and then it's deallocated.

Notice how every time we made an assignment, we had to perform a reference counting operation to maintain the reference count of the class instance.

This is important since we always have to maintain memory safety.

Now, for those of you who are familiar with Objective-C, of course, nothing new is happening here with, of course, increment and decrement being respectfully retained and released.

But now I'd like to talk to you about something that's perhaps a bit more exotic, more unfamiliar.

Namely, how structs interact with reference counting.

I'll begin let's begin this discussion by looking at a class that doesn't contain any references.

Here I have a class, Point.

Of course, it doesn't contain any references, but it does have two properties in it, x and y, that are both floats.

If I store one of these points in an array, because it's a class, of course, I don't store it directly in the array.

Instead, I store reference to the points in the array.

So when I iterate over the array, when I initialize the loop variable p, I am actually creating a new reference to the class instance, meaning that I have to perform a reference count increment.

Then, when p is destroyed at the end of the loop iteration, I then have to decrement that reference count.

In Objective-C, one would oftentimes have to make simple data structures, like Point, a class so you could use data structures from Foundation like NSRA.

Then whenever you manipulated the simple data structure, you would have the overhead of having a class.

In Swift, we can use structs in Swift, we can work around this issue by using a struct in this case instead of a class.

So let's make Point a struct.

Immediately, we can store each Point in the array directly, since Swift arrays can store structs directly.

But more importantly, since a struct does not inherently require reference counting and both properties of the struct also don't require reference counting, we can immediately eliminate all the reference counting overhead from the loop.

Let's now consider a slightly more elaborate example of this by considering a struct with a reference inside of it.

While a struct itself does not inherently require reference counting modifications on assignment, like I mentioned before, it does require such modifications if the struct contains a reference.

This is because assigning a struct is equivalent to assigning each one of its properties independently of each other.

So consider that the struct Point that we saw previously, it is copied efficiently, there are no reference counting needed when we assign it.

But let's say that one day I'm working on my app and I decide that, well, I would like to make each one of my Points to be drawn a different color.

So I add a UIColor property to my struct.

Of course, UIColor being a class, this is actually adding a reference to my struct.

Now, this means that every time I assign this struct, it's equivalent to assigning this UIColor independently of the struct, which means that I have to perform a reference counting modification.

Now, while having a struct with one reference count in it is not that expensive, I mean, we work with classes all the time, and classes have the same property.

I would now like to present to you a more extreme example, namely, a struct with many reference counted fields.

Here I have a struct user, and I am using it to model users in an app I am writing.

And each user instance has some data associated with it, namely, three strings one for the first name of the user, one for the last name of the user, and one for the user's address.

I also have a field for an array and a dictionary that stores app-specific data about the user.

Even though all of these properties are value types, internally, they contain a class which is used to manage the lifetime of their internal data.

So this means that every time I assign one of these structs, every time I pass it off to a function, I actually have to perform five reference counting modifications.

Well, we can work around this by using a wrapper class.

Here again, I have my user struct, but this time, instead of standing on its own, it's contained within a wrapper class.

I can still manipulate the struct using the class reference and, more importantly, if I pass off this reference to a function or I declare or I sign initialize a variable with the reference, I am only performing one reference count increment.

Now, it's important to note that there's been a change in semantics here.

We've changed from using something with value semantics to something with reference semantics.

This may cause unexpected data sharing that may lead to weird results or things that you may not expect.

But turns out there is a way that you can have value semantics and benefit from this optimization.

If you'd like to learn more about this, please go to the Building Better Apps with Value Types talk in Swift tomorrow in Mission at 2:30 p.m. It's going to be a great talk.

I really suggest that you go.

Now that we've talked about reference counting, I'd like to continue by talking a little bit about generics.

Here I have a generic function min.

It's generic over type T that conforms to the comparable protocol from the Swift Standard Library.

From a source code perspective, this doesn't really look that big.

I mean, it's just three lines.

But in reality, a lot more is going on behind the scenes than one might think.

For instance, the code that's actually emitted here, again I am using a pseudo-Swift to represent the code the compiler emits the code the compiler emits is not these three lines.

Instead, it's this.

First notice that the compiler is using indirection to compare both x and y.

This is because we could be passing in two integers to the min function, or we could be passing in two floats or two strings, or we could be passing in any comparable type.

So the compiler must be correct in all cases and be able to handle any of them.

Additionally, because the compiler can't know if T requires reference counting modifications or not, it must insert additional indirection so the min T function can handle both types T that require reference counting and those types T that do not.

In the case of an integer, for instance, these are just no-up calls into the Swift runtime.

In both of these cases, the compiler is being conservative since it must be able to handle any type T in this case.

Luckily, there is a compiler optimization that can help us here, that can remove this overhead.

This compiler optimization is called generic specialization.

Here I have a function foo, it passes two integers to the generic min-T function.

When the compiler performs generic specialization, first it looks at the call to min and foo and sees, oh, there are two integers being passed to the generic min-T function here.

Then since the compiler can see the definition of the generic min-T function, it can clone min-T and specialize this clone function by replacing the generic type T with the specialized type Int.

Then the specialized function is optimized for Int, and all the overhead associated with this function is removed, so all the reference count the unnecessary reference counting calls are removed, and we can compare the two integers directly.

Finally, the compiler replaces the call to the generic min-T function with a call to the specialized min Int function, enabling further optimizations.

While generic specialization is a very powerful optimization, it does have one limitation; namely, that namely, the visibility of the generic definition.

For instance, this case, the generic definition of the min-T function.

Here we have a function compute which calls a generic min-T function with two integers.

In this case, can we perform generic specialization?

Well, even though the compiler can see that two integers are being passed to the generic min-T function, because we are compiling file 1.Swift and file 2.Swift separately, the definition of functions from file 2 are not visible to the compiler when the compiler is compiling file 1.

So in this case, the compiler cannot see the definition of the generic min-T function when it's compiling file 1, and so we must call the generic min-T function.

But what if we have Whole Module Optimization enabled?

Well, if we have Whole Module Optimization enabled, both file 1.Swift and file 2.Swift are compiled together.

This means that definitions from file 1 and file 2 are both visible when you are compiling file 1 or file 2 together.

So basically, this means that the generic min-T function, even though it's in file 2, can be seen when we are compiling file 1.

Thus, we are able to specialize the generic min-T function into min int and replace the call to min-T with min Int.

This is but one case where the power of whole module optimization is apparent.

The only reason the compiler can perform generic specification in this case is because of the extra information provided to it by having Whole Module Optimization being enabled.

Now that I have spoken about generics, I'd like to conclude by talking about dynamic dispatch.

Here I have a class hierarchy for the class Pet.

Notice that Pet has a method noise, a property name, and a method noiseimpl, which is used to implement the method nose.

Also notice it has a subclass of Pet called Dog that overrides noise.

Now consider the function make noise.

It's a very simple function, it takes an argument p that's an instance of class Pet.

Even though this block of code only involves a small amount of source again, a lot more is occurring here behind the scenes than one might think.

For instance, the following pseudo-Swift code is not what is actually emitted by the compiler.

Name and noise are not called directly.

Instead, the compiler emits this code.

Notice the indirection here that's used to call names getter or the method noise.

The compiler must insert this indirection because it cannot know given the current class hierarchy whether or not the property name or the method noise are meant to be overridden by subclasses.

The compiler in this case can only emit can only emit direct calls if it can prove that there are no possible overrides by any subclasses of name or noise.

In the case of noise, this is exactly what we want.

We want noise to be able to be overridden by subclasses in this API.

We want to make it so that if I have an instance of Pet that's really a dog, the dog barks when I call noise.

And if I have an instance of Pet that's actually a class, that when I call noise, we have a meow.

That makes perfect sense.

But in the case of name, this is actually undesirable.

This is because in this API, name is not is never overridden.

It's not necessary to override name.

We can model this by constraining this API's class hierarchy.

There are two Swift language features that I am going to show you today that you can use to constrain your API's class hierarchy.

The first are constraints on inheritance, and the second are constrains on access via access control.

Let's begin by talking about inheritance constraints, namely, the final keyword.

When an API contains a declaration with the final keyword attached, the API is communicating that this declaration will never be overridden by a subclass.

Consider again the make noise example.

By default, the compiler must use indirection to call the getter for name.

This is because without more information, it can't know if name is overridden by a subclass.

But we know that in this API, name is never overridden, and we know that in this API, it's not intended for name to be able to be overridden.

So we can enforce this and communicate this by attaching the final keyword to name.

Then the compiler can look at name and realize, oh, this will never be overridden by a subclass, and the dynamic dispatch, the indirection, can be eliminated.

Now that we've talked about final inheritance constraints, I'd like to talk a little bit about access control.

Turns out in this API, pet and dog are both in separate files, pet.Swift and dog.Swift, but are in the same module, module A.

Additionally, there is another subclass of pet called Cat in a different module but in the file cat.Swift.

The question I'd like to ask is, can the compiler emit a direct call to noiseimpl?

By default, it cannot.

This is because by default, the compiler must assume that this API intended for noiseimpl to be overridden in subclasses like Cat and Dog.

But we know that this is not true.

We know that noiseimpl is a private implementation detail of pet.Swift and that it shouldn't be visible outside of pet.swift.

We can enforce this by attaching the private keyword to noiseimpl.

Once we attach the private keyword to noiseimpl, noiseimpl is no longer visible outside of pet.Swift.

This means that the compiler can immediately know that there cannot be any overrides of noiseimpl in cat or dog because, well, they are not in pet.Swift, and since there is only one class in pet.Swift that implements noiseimpl, namely Pet, the compiler can emit a direct call to noiseimpl in this case.

Now that we've spoken about private, I would like to talk about the interaction between Whole Module Optimization and access control.

We have been talking a lot about the class Pet, but what about Dog?

Remember that Dog is a subclass of Pet that has internal access instead of public access.

If we call noise on an instance of class Dog, without more information, the compiler must insert indirection because it cannot know if there is a subclass of Dog in a different file of module A.

But when we have Whole Module Optimization enabled, the compiler has module-wide visibility.

It can see all the files in the module together.

And so the compiler is able to see, well, no, there are no subclasses of dog, so the compiler can call noise directly on instances of class Dog.

The key thing to notice here is that all I needed to do was to turn on Whole Module Optimization.

I didn't need to change my code at all.

By giving the compiler more information, by allowing the compiler to understand my class hierarchy, with more information I was able to get this optimization for free without any work on my part.

Now I'd like to bring back that graph that Nadav introduced earlier.

Why Is Swift so much faster than Objective-C on these object-oriented benchmarks?

The reason why is that in Objective-C, the compiler cannot eliminate the dynamic dispatch through Ob-C message send.

It can't inline through it.

It can't perform any analysis.

The compiler must assume that there could be anything on the other side of an Ob-C message send.

But in Swift, the compiler has more information.

It's able to see all the certain things on the other side.

It's able to eliminate this dynamic dispatch in many cases.

And in those cases where it does, a lot more performance results, resulting in significantly faster code.

So please, use the final keyword in access control to communicate your API's intent.

This will help the compiler to understand your class hierarchy, which will enable additional optimizations.

However, keep in mind that existing clients may need to be updated in response to such changes.

And try out Whole Module Optimization in your release builds.

It will enable the compiler to make further optimizations for instance, more aggressive specialization and by allowing the compiler to better understand your API's class hierarchy, without any work on your part, you can benefit from increased elimination of dynamic dispatch.

Now I'd like to turn this presentation over to Joe, who will show you how you can use these techniques and instruments to improve the performance of your application today.

[Applause]

JOE GRZYWACZ: Thank you, Michael.

My name is Joe Grzywacz.

I am an engineer on the Instruments Team, and today I want to take you through a demo application that's running a little slowly right now, so let's get started.

All right.

So here we have my Swift application that's running slowly, so what I want to do is go ahead and click and hold on the Run button and choose Profile.

That's going to build my application in release mode and then launch instruments as template choosers so we can decide how we want to profile this.

Since it's running slowly, a good place to start is with the time profiler template.

From Instruments, just press Record, your application launches, and Instruments is recording data in the background about what it's doing.

So here we can see we are running at 60 frames per second before I've started anything, which is my target performance.

But as soon as I add these particles to the screen, they are moving around and avoiding each other just like I wanted, but we are running at only about 38 frames per second.

We lost about a third of our performance.

Now that we have reproduced the problem, we can quit our application and come back to Instruments.

Let me make this a little bit larger so we can see what's going on.

You can just drag this, drag that around.

View Snap Track to Fit is handy to make your data fill your horizontal time.

Now what are we looking at?

Here in the track view, this is our CPU usage of our application.

We can see on the left before I did anything, CPU usage was low; after I added those particles, CPU usage became higher.

You can see what those values are by moving your mouse and hovering it inside this ruler view.

You can see prior we were around 10% or so, not doing much.

Later on we moved around 100%.

So we saturated our CPU.

In order to increase our performance, we need to decrease how much work we're doing.

So what work were we doing?

That's where this detail pane down below comes in.

So here's all of our threads.

Go ahead and open this up a little bit.

You are probably familiar with this call stack from seeing it inside of Xcode in the debugger.

Start, calls main, calls NS application main, et cetera.

But what Instruments is also going to tell you is how much time you were spending inside of that function, including its children, right here in this first column Running Time.

We can see 11,220 milliseconds, or 99% of our time, was spent in NSApplication Main or the things it called.

The second column, Self, is how much time the instrument sampled inside that function itself, so it excludes its children.

So what I want to do is see where does that self number get larger, and that means that function is actually performing a lot of work.

You can continue opening these up one by one, hunting around, but that can take a little while.

Instead we recommend you come over here to the right side, this extended detail view, and Instruments will show you the single heaviest stack trace in your application.

That's where it sampled the most number of times.

You can see again here is our main thread, it took 11,229 milliseconds.

It began in Start.

Symbols in gray are system frameworks.

Symbols in black here, like Main, are your code.

And what I'd like to do is just look down this list and see if it's kind of a big jump.

That means something interesting happened around this time.

If I scan down this list, the number is slowly getting smaller, but there's no big jumps going on, until I get down here where I see a jump from about 9,000 to about 4,000.

So something happened there.

I am going to go ahead and click on my code, and Instruments has automatically expanded the call tree on the left side so you can see what you just clicked on.

Let me frame this up.

And what's going on here?

Well, if I back up just a little bit for a moment, here is my NSFiretimer call, what's driving my simulation, trying to get at 60 frames per second.

Down here is my particle Sim.app delegate.update routine, that's my Swift routine driving my simulation.

But in between is this weird @objc thing sitting here.

I want to point out that's just a thunk.

Basically, it's a compiler inserted function that gets us from the Objective-C world here in NSFiretimer down to the Swift world down here inside of my code.

That's all it is.

Otherwise, we can ignore it.

Now, we can see my update routine is taking 89% of the time, so continuing to optimize this function is a good idea.

So everything else above it is not really interesting to me.

I am going to go ahead and hide it by focusing in on just this update routine by clicking this arrow here on the right.

Everything else around this has been hidden.

Running time has been renormalized to 100%, just to help you do a little less mental math.

If we look in on what's going on in this function, Update Phase Avoid calls Find Nearest Neighbor, that calls down into something really interesting here.

We see Swift release is taking 40% of our time, and Swift retain is taking another 35% of our time.

So between just these two functions, we are doing about three-quarters of our update routine is just managing reference counts.

Far from ideal.

So what's going on here?

Well, if I double-click on my Find Nearest Neighbor routine that calls those retains releases, Instruments will show you the source code.

However, Swift is an automatic reference counted language, so you are not going to see the releases and retains here directly.

But you can, if you go over to the disassembly view, click on that button there, Instruments will show you what the compiler actually generated.

And you can hunt around in here and see there's a bunch of calls here.

There's 23% of the time on this release.

There's some more retains and releases here.

There is another release down here.

They are all over the place.

So what can we do about that?

Let's return to our code here and go to my particle file.

Here is my class Particle, so it's an internal class by default.

And it adheres to some collidable protocol.

All right.

Down below is this is the Find Nearest Neighbor routine that was taking all of that time before.

Now, I know that when the update timer fires, that code is going to call Find Nearest Neighbor on every single particle on the screen, and then there's this interfor loop that's going to iterate over every single particle on the screen.

We have an N-squared algorithm here or effectively, the stuff that happens inside this for loop is going to happen a really large number of times.

Whatever we do to optimize this thing should have big payoff.

So what is going on?

We have our for loop itself where we access one of those particles.

So there's some retain release overhead.

There are property getters being called here, this dot ID property.

And as Michael was talking about, since this is an internal class, there might be some other Swift files somewhere that overrides these property getters, so we are going to be performing a dynamic dispatch to these property getters, which has retain/release overhead as well.

Down here there is this distance squared function call.

Despite the fact that it lives literally a dozen source code lines away, once again, we are going to be doing a dynamic dispatch to this routine with all of that overhead as well as the retain release overhead.

So what can we do about this code?

Well, this code is complete.

I wrote this application, I am finished, my particle class is complete, and I have no need to subclass it.

So what I should do is communicate my intention to the compiler by marking this class as final.

So with that one little change, let's go ahead and profile application again and see what happened.

This time, the compiler was able to compile that file, knowing that there are no other subclasses of that particle file particle class, excuse me and that means it's able to perform additional optimizations.

It can call those functions directly, maybe even inline them, or any other number of optimizations that can reduce the overhead that we had before.

So if we record, this time when I add the particles, we can see they are moving around and running around at 60 frames per second at this time, so we got back 20 frames per second with just that one small change.

That's looking good.

However, as you may guess, I have a second phase here called collision where we swap the algorithm and now they are bouncing off one another, and again our frame rate dropped by about 25 percent down to 45 frames per second.

We reproduced the problem again, let's return to Instruments and see what's happening.

We will do what we do before, make this a little bit larger, Snap Track to Fit, and now what do we see?

Over here on the left, this was our avoidance phase.

Things are running much better, around 30%, 40% or so, so that's why we are hitting our 60 frames per second.

But over here on the right, this is our collision phase.

And now this is capping out at 100% of our CPU, and that's why our frame rate is suffering again.

We did what we did a moment ago right now, this call tree data down here in the detail pane is going to have data from this avoidance phase, which is running fine, as well as this collision phase, which is what I really want to actually be focusing on.

So that avoidance sample over here is going to water down our results.

Instead, I would like to set a time filter so I am only looking at my collision phase.

That's really simple to do.

Just click and drag in the timeline view, and now our detail pane has been updated to only consider the samples from our collision phase.

Now we can do what we did before, head over to our extended detail view.

Look down this list, see where we see a jump, and something interesting happens here, we went from about 8,000 milliseconds to 2,000 milliseconds.

So I am going to click on my collision detection class here.

Instruments once again automatically expands this call tree for us.

And if we just kind of look at what's going on here, 88% of my time is spend inside of this runtime step routine.

This is a good place to dig in.

I'll do what I did before and click on this Focus arrow here on the right.

Now we are looking at just our runtime step routine, and let's see what it's doing.

All right.

Well, 25% of its time is being spent inside of Swift.array.underscore getelement.

When you see this A inside of angle brackets, that means you are calling into the generic form of that function and all the overhead that entails.

You will see this again here inside of Swift array is valid subscript, there's that A inside of angle brackets.

It also happens when you have that A inside of square brackets.

So we are calling a generic property getter here.

So just between these three generic functions, we are looking at about 50% of our time is being spent inside of these generic functions.

So what can we do about getting rid of that overhead?

All right, back over to Xcode.

Here is my collision detection file.

Here we can see that collidable protocol that my particle was adhering to.

Here is that generic class, class detection, type T that adheres to a collidable protocol.

What does it do, well it has this collidables array here, that's of generic type T.

And here down below is our runtime step routine, and that's where we were spending all of our time.

So what does this function do?

Well, it iterates over all our collidables, accesses one of the collidables from that array, calls a bunch of property getters here.

Here's some more.

There is an interfor loop, where we do kind of the same thing again, we pull out another second collidable from that array.

Then all sorts of property getters down below.

We're doing a lot of generic operations here, and we'd really like to get rid of that.

How do we do that?

Well, this time you can see my collision detection class is here inside of this Swift file.

However, the users of this, where I am using this class is inside this app delegate routine, this particle Swift file, so it's in other parts of this module, so we are going to have to turn to Whole Module Optimization.

Doing that's really easy, just click on your project.

Go over here to build settings.

Make sure you are looking at all of your build settings.

Then just do a search for optimization.

And here is that setting that Nadav showed you earlier.

You just want to switch your release build over to Whole Module Optimization.

And now when we profile, the compiler is going to look at all those files together and build a more optimized binary, but let's check and see what happened.

So we will launch time profiler for the third time here, start our recording, and 60 frames per second, we add our particles, this avoidance phase still running at 60 frames per second.

Good, I expected that not to change.

Always good to verify.

Then we move over to our collision phase.

Now that is running at 60 frames per second as well.

All it took was a couple minutes of analysis and a few small tweaks, and we made our application a lot faster.

[Applause]

All right.

So to summarize what we saw here today, we know that Swift is a flexible programming language that uses that's safe and uses automatic reference counting to perform its memory management.

Now, those powerful features are what make it a delight to program in, but they can come with a cost.

What we want you to do is focus on your APIs and your code that when you are writing them, you keep performance in mind.

And how do you know what costs you are paying for?

Profile your application inside of Instruments, and do it throughout the lifetime of your application development so that when you find a problem, you find it sooner and you can react to that more easily, especially if it involves changing some of your APIs.

There's documentation online, of course.

The Developer Forums where you can go, and you will be able to ask questions about Swift and get them answered, as well as Instruments.

And speaking of Instruments, there's a Profiling in Depth talk today in Mission at 3:30.

There is an entire session devoted to Time Profiler and getting into even more depth than we're able to get into today.

And as Michael talked about earlier, there is a Building Better Apps with Value Types in Swift that will also build upon what you saw today.

So thank you very much.

[Applause]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US