What’s New in LLVM 

Session 417 WWDC 2014

The Apple LLVM compiler continues to evolve, with support for 64-bit iOS products, powerful new optimizations, and other new features. Learn about some of the advanced technology that the compiler uses to increase the performance of your code, and get details on how to take advantage of the latest features in the compiler.

Good morning!


[ Applause ]

Glad to see a number of folks out now, bright and early, to talk about all the heart-pounding excitement in the world of compilers.

And I’m Jim Grosbach, and I’m really happy to be here today to share with you all of the new things that we have in LLVM.

When we normally talk about LLVM and what first comes to mind when we think about it is the Apple LLVM Compiler itself.

This is what we all use to build our apps and that’s where we really first encounter LLVM, but it’s much more than that.

LLVM is used in a wide variety of products and tools that we all use every day, both as developers and as end users.

Over the years LLVM has grown to be a really key technology here at Apple for building tools, for performance, and for modernization, and that has been no exception this year as we have moved swiftly along with a wide variety of new improvements.

To start with, back in September we introduced the Apple A7 processor which has been just absolutely magnificent in what it’s allowed us to do, bringing truly desktop-class performance to your mobile devices and LLVM plays a key role in this.

And now we’re encouraging more of you to use this technology in your apps, so building for 64-bit in iOS is now the default.

As of Xcode 5.1 carrying on into Xcode 6, when you rebuild your app, if you’re using standard architectures, ARM64 will be included.

This does not impact your deployment story.

You can continue to deploy back to iOS 4.3.

We still build for arm V7 for 32 bit.

All of the development work flows that you’re familiar with for the simulator, the debugger, profiling, all of these things continue to work transparently, just as you’re familiar with in a 64-bit environment.

Now, one thing to be aware of is that because ARM64 is an entirely new architecture, your entire application must be built 64 bit, not just a few libraries here, or a few files there, but the whole app.

So, if you’re relying on third-party libraries and those libraries have not yet adopted 64 bit, please work with your vendors and encourage them to update and support 64 bit development so that your app can then migrate as well and get the benefits.

Now, during migration there are a few things that we’d like to bring to your attention that might come up, a few advancements we’ve made, and a few things that we’ve tightened up in the specification and what the possible impact of that tier app is.

To start with, in 64-bit iOS all functions must have a prototype.

This has been good style since time immemorial and it’s been required for C++ since the start.

It’s been highly suggested in C, for any modern version not using a prototype is deprecated and has been for a very long time now.

So, we’ve taken advantage of this in ARM64 to generate more efficient calling convention code, in particular for variatic functions like printf, that the number of arguments to the function varies by call site.

So, when you have older code that you’re using that may not use prototypes, what is normally a warning has now been promoted to an error, so the compiler will highlight to you in your code exactly where this is happening so that you know which prototypes to go add to your header files to move on.

One place that this does sometimes come up in a little bit more of a subtle way is when C and Objective-C interworking code with direct indications of Objective-C message send.

To help find this, we have a new Xcode setting to enable strict checking of objc underscore msgSend.

This is a recommended setting and when you first upgrade your project to [inaudible] code 6 we’ll encourage you to adopt this setting.

And what’s tricky is that every indication of objc underscore msgSend effectively has a different type.

It has the type of what the final receiving method is going to be.

For example here, a trivial piece of code that’s invoking method foo, with strict checking enabled, the compiler will now tell us that we need to tell it what the final type is.

This is straightforward to do.

It’s a little bit verbose, but very straightforward.

We simply add the type of the final receiving method.

Done it here with a typedef.

This could be done with a direct type test on all on one line, if you prefer, just to make sure that the compiler knows what the final receiving type of the method is so that it can generate the right code to get the final result correct.

Another place that we’ve tightened things up and taken advantage of our new ABI and ARM64 is the Objective-C Boolean type.

If any of you were at Stump the Experts last night, this topic actually came up as a question.

It was rather amusing like, “I have a slide on that!

That’ll be great!”

So, BOOL is basically now a BOOL type.

Previously, it’s been a signed character.

And, sometimes our code our code as well, not just in yours, would put values into the Boolean type that weren’t strictly Boolean.

Now, the compiler is going to be taking advantage of this type or definition, so what can happen is that if your code does that, the results between 32-bit iOS and 64-bit iOS may differ.

So, if you start seeing some odd behaviors with Booleans, this is something to look out for.

We also have pointers.

As we’re now 64-bit architecture, this is kind of the core of what this is all about, that pointers and longs are now 64 bits.

So, old code would often do horrible things like casting integers to pointers, and back-and-forth.

And hopefully, we don’t write code that does that anymore, but we all have this legacy code that we have to live with, and now this can bite us if we’re not careful.

This is very similar to what we’ve all dealt with on the 32-bit to 64-bit Intel transition, if we went through that.

That’s still a problem; we haven’t magically just solved that in the compiler.

So for example here, we’re casting an integer which came from a pointer somewhere else.

We’re casting that to a void.

But now the compiler can help a little bit.

It can at least inform us that the problem is coming up and tell us that, “Oh, we have a problem here that we need to go and look at and make sure that this is really what’s happening.”

Now, if we ignore this warning, the runtime in the kernel is going to be a little bit more forceful about this.

If we dereference that pointer we’re going to get a hard fault because the page zero is mapped to always give a fault so if we miss any of these through other warnings, we’ll still get an error.

Paying attention to the compiler, it’s going to be a lot friendlier because it’ll be nice and friendly and tell you the line number and the source file for where the problem is.

The kernel’s just going to tell you you did something bad.

To address this we use the C language typedefs that are 64 and 32-bit cleaned.

We say we want a signed integer, an unsigned integer, that is an appropriate type for saving a pointer value or for indexing into an array for comparing the differences between two pointers.

For example, if we would modify our previous code to simply use the intptr type, which when we’re compiling for 32-bit iOS, will we get 32-bit signed integer, and for 64-bit iOS will be a 64-bit signed integer.

Slightly more subtly, this can come up in structure layouts.

When we use a long or pointer these now grow, which change but the size and sometimes the alignment, the offsets of other fields in our structures.

And, we have to be careful that this is done in a way that’s safe.

Now, most of time this is going to work transparently, because these structures are used entirely within our application and everything gets the new definition and works fine.

But, if we’re doing something like a representation of an on-disk file format communicating across a network to another process that is going to rely on the exact layout of a structure, that can go badly.

So again, on any of those data structures we want to use the C fixed type, fixed size types to make sure that we get what we want, whether we’re building for 64-bit iOS or for 32-bit iOS.

So in summary, building for 64-bit iOS is easy, it’s a default, and the compiler will help find and resolve any issues.

But, this isn’t the only thing that we’ve been up to.

We’ve also been making advances in Objective-C and the compiler can help here, too.

The language has continued to move forward.

Some of this really helps with the interoperability with SWF as well, as you may be seeing in that talk.

I highly encourage you to check it out.

It’s happening at the same time as this one, so go and look on the video when that comes on the WWDC app.

And, whenever we write new code, we’ve been using all of these advancements in the language to get the modern best practices, more expressive code, but then we have all of this older legacy code that we’d like to adopt all of these features in, as well.

But, that’s a lot of code to go read through and manually find all of these things, so we have a tool that will help us identify the opportunities where we can use these new features.

And, I think the best way to talk about that is to show you with the demo.

Now, rather than use some contrived example code here, I thought we’d maybe look at something that we all are familiar with, at least as users, and our WWDC app.

That code has been with us for a while.

We update it every year.

And, with the modernizer we wanted to use that to look at it and find out if there are perhaps some places in the codebase that we missed for opportunities to use new Objective-C features.

So, let’s look and see what a few of those things that we found are.

If we go under Edit to refactor, we can convert our project to modern Objective-C syntax.

We get a dialog box telling us what we’ve just selected, so make sure that we’ve got the right thing.

We can select whether which targets in our project to modernize.

In this case, we’re looking at the WWDC app, itself.

In the previous versions of Xcode the modernizer would go through and just look for Objective-C literals and subscripting.

But now, we have more options.

Now personally, I prefer not to do all of these at once.

That tends to be a little too much to swap back-and-forth, so I tend to want to select a few things.

I’m going to look for instance type here that we can get our initialization methods more strongly typed.

I’m going to try and find if we missed any read/write properties where we convert explicit getter/setter methods.

And, we’re going to look to use NS ENUM for our enumeration values so the compiler can cooperate with the runtime to give better results.

Click Next, and the compiler will run over our code and it turns out we do, indeed, have a few more suggestions for what we can look at.

Now, do keep in mind that these are just suggestions, that we need to go through and look at the side-by-side diff here, where we have the new code on the left, the old code on the right, and we look through here.

This looks fine.

Everything looks good here.

We’re converting to ENUMS.

Let’s look at our next one.

This looks a little bit different, because we still have this NS integer over here that looks like it’d be straightforward to clean up, but I’d rather come back to this later.

I just want to deal with the things that we can do automatically right now.

So, I tell the modernizer to discard that change.

It wants to make sure that I’m doing that.

Yes, I am absolutely sure.

I can do the same here.

We could also tell it to ignore all of the changes in this file with this Check button here.

And now, it’s also found a place where we can use an instance type.

And, that all looks good so we tell it to save.

And, Xcode will now tell us that, “Oh!

We can update our project as well, and take snapshots.”

That sounds great.

Let’s let it do that because backups are good.

And now, our project is saved, it’s been rebuilt, and the Objective-C modernizer works to update our code and help us find places where we can take advantage of new features.

But, this isn’t the only place that we’ve made advances for Objective-C and for interoperability.

And, to tell you more about that I’d like to invite my friend and coworker, Bob Wilson.

Thank you, Bob.

[ Applause ]

Thank you, Jim.

So, modules are another way that LLVM can help modernize your code.

We introduced modules just last year, but in cases you missed that let’s start with some background.

So, before modules we had precompiled headers, which are often an effective way to speed up the compilation of your code, but they do have some limitations.

You can only have one precompiled header at a time, and more importantly, the whole approach of using a textual inclusion of a header file as a way of importing a framework is just fragile.

We have a deal with the issue where a header file gets included more than once in a single compilation.

We have also a problem of headers being fragile.

And, what I mean by that is that the meaning of the header can change depending on the environment where it’s imported, and let me show you that with an example.

So here, I’ve defined a macro count to the value of 100, and then I import the foundation framework.

Now, inside the foundation header there’s an include for the NSArray definition.

An NSArray has an ivar, the count.

So, the macro of count gets substituted as literal text in that place and we end up with completely broken code where instead of the ivar name, we have a value of 100.

This is what I mean by headers being fragile.

Modules solve this problem by replacing the model of textual inclusion with a semantic import.

And, there’s a lot more detail about modules in the Advances in Objective-C presentation from last year’s WWDC and I encourage you to watch that if you’re not familiar with modules.

Until now, modules have only been available for the system frameworks.

The new in Xcode 6, you can now define modules for your own frameworks as well for C and Objective-C.

Besides fixing the problems we just looked at, this also gives you a way of importing your own framework into your SWF code.

And as Jim mentioned, there’s another session on integrating SWF with Objective-C that I encourage you to watch the video to learn more about that.

So if you want to do this, how?

It’s really very easy.

For most frameworks it’s possible to define a single umbrella header that imports all of the framework API.

And, this is what we recommend that you do as it is the easiest way to adopt a module.

Once you’ve done that, simply go to the Xcode BUILD settings for your framework and in the packaging section set Defines Module to Yes, and that’s it.

It really is very easy.

Now, if you have a more complicated framework where that single umbrella header is not sufficient, you can use a custom module map.

And, there’s more information to describe how that works on the LLVM website.

After you’ve created a module you’ll want to use it.

How do you do that?

There’s an @import keyword followed by the module name that tells the compiler, “I want to import this module.”

If you haven’t had a chance to update your code and you’re still have a #import to include the umbrella header, the compiler’s smart enough to know that this is now a modular framework and it will go ahead and treat that as an implicit modular import anyway.

So just as a guideline though, we do recommend you use @import when you’re importing your framework into a separate target within your project just because it makes it clear in the source that you really intend for this to be a modular import.

One exception to that is within the implementation of your framework, itself.

It doesn’t make any sense to import a framework into itself and so, in that case, you really need to use #import to textually include the framework headers, just within the implementation of the framework.

And, besides those guidelines, we have a few other rules about modules that you should be aware of.

First, don’t expose any non-modular headers in your framework API.

It’s fine to import another module, like Cocoa, but if I have an import of something like Postgres.h, which presumably is not a module, you can put that down inside the implementation of your framework, but don’t expose it in the API.

One other issue is that modules can change the semantics of your code.

We saw earlier the problem of a fragile header where a macro definition inadvertently broke the code.

Sometimes you might want to do this on purpose, and I’m showing here an example where I’ve defined a macro, DEBUG, as a flag to enable additional debugging APIs in my framework.

By switching that framework to be a module, the DEBUG macro defined in my source code no longer has any effect, which is not what I wanted.

Now, that limitation only applies to macros that are defined in the source code.

So, if you really want to do something like this, one alternative is to define the macro on the command line or in the Xcode build settings.

So, that is user-defined modules.

It’s really pretty straightforward in the common case, and it gives you fast compilation, clear semantics, and a way of interoperating with SWF code.

So far, we’ve been talking a lot about ways that LLVM helps you modernize your code and adopt modern Objective-C modules, but let’s turn now and look at performance, which is the other theme of this presentation.

Profile Guided Optimization, or PGO, is a new feature in Xcode 6 and it gives you a way of getting even more performance out of your code.

Let me give you an overall high-level understanding of what this is about.

One of the inherent challenges for the compiler is that it has no way of knowing what the input to your program is going to be.

The only input to the compiler is your source code.

So, the compiler has to assume that all inputs are equally likely.

There are some cases where it can guess that certain code paths will be more common than others.

For example, it can assume that going through a loop is going to happen more often than code outside of that loop.

But, those are just guesses and there are a lot of things that it simply can’t know.

If we provide a profile as an additional input to the compiler it can now try to optimize for the common case and do a better job of optimization.

And, what I mean by a profile, here, is simply a count how many times each statement in your app executes in a typical run of your app.

You may be wondering, “How do I get a profile like that?”

Again, we could use the compiler here to generate a special instrumented app that as it runs is going to count how many times each statement will executes.

And then, when your app finishes with this special instrumented version, it will write out that profile which we can then use for PGO.

So, how does the compiler use that profile information?

There are an awful lot of ways.

So many optimizations can benefit from this, but I’m highlighting just three here that are particularly valuable.

One is to the inliner.

If we know that a function is really hot, and by that I mean it’s run a lot, over and over.

The inliner can be much more aggressive about inlining that.

When we’re generating the code we can try to layout the common paths through your code so that they’re contiguous, which makes it easy for the processor to run them fast.

And the register allocator can also try to keep values in registers throughout those most common paths.

Let’s look at an example just to give you a better understanding of this.

This is some C++ code that’s going to iterate over a set of colored objects and for each one it’s going to update the position of the object.

So, at the top I’ve got a loop over the objects, and for each one I’m going to call my Update Position function.

And, Update Position is going to look and see if the object is red it moves in a very simple horizontal line, so the code is really simple.

But, if the object is blue, let’s assume that the movement is much more complicated, I’ve got a very large block of code here.

Now, the compiler has no way of knowing whether red objects or blue objects are more likely, so it just assumes they’re both equally likely.

But, with PGO I might be able to know that red objects are far more common.

And so, I’m highlighting in red here the hot code, which is the code to iterate over the set of objects and then to handle the red objects.

I’m going to color-code the cold code in blue, which is blue objects which are rare for some reason in this application.

And then, let’s look at how the compiler would handle this code.

Here’s kind of the default code layout that matches, roughly, the original source order.

We’ve got the hot loop outside, and then the Update Position function down below, with a little bit of hot code in it.

Inlining is one of the most important optimizations and we’d really like to inline that Update Position function.

But, the compiler can’t inline everything or the code would bloat beyond a point where it would be useful.

But in this case, the Update Position function is big because of all that cold code for handling the blue objects and so it wouldn’t normally be inlined.

But, because PGO tells us there’s some really hot code here, the inliner can be much more aggressive about that in this particular case.

So, we take the loop iterating over the objects and split that in half and move the Update Position code right inline.

So, this is much better now.

We’ve got a lot of the hot code right together, but we’ve still got a big chunk of this code for blue objects, the cold code, right in the middle of our loop.

And, PGO can help this, as well, by changing the code layout.

It knows that that code is cold and can move it down below, out of the way, and we end up with a nice tight loop that can run really fast.

And, it also typically enables other optimizations on that hot code.

So obviously, this is a simplified example, but hopefully gives you a feel of the power of PGO and just how much it can help the optimizer.

So, you may want to use it.

When does it make sense?

The compiler does a really good job optimizing by default.

With PGO, if you do just a little bit of extra work to gather the profile you can do even better.

So obviously, if you’re happy with the performance you’re already getting, you’re probably not motivated to do that even that little bit of extra work.

But, if you need more performance, by all means, give it a try.

And, let me show you some examples of just how much it can help.

This is a graph showing the speedup.

Compare it with PGO compared to a case of just a normal optimize build.

And, I’m looking at four different applications here; the Apple LLVM compiler itself, applying PGO to the compiler itself, the SQLite database, the PERL interpreter, and gzip file compression.

And, PGO gives us speedups ranging from about 4% all the way up to 18%.

So, not all apps will benefit this much.

It really varies, depending on the app, but clearly there’s a lot of potential here.

So, if you want to try it, how do you go about that?

PGO is really easy to use.

The first step is to collect a profile.

I’m going to come back and talk about that in just a minute.

Once you’ve done that, simply go in the Xcode Build settings for your project and find the Use Optimization Profile setting, and set it to Yes, typically just for the release configuration.

And that’s it!

You’ve enabled PGO.

Once you’ve done that, as you continue developing your app you may change it as you fix bugs, you add new features, the code becomes gradually out of sync with the profile you’ve collected earlier.

And, when that happens, the compiler will simply fail to use that profile information.

It won’t break anything, you just gradually lose the optimization benefit.

And when that happens, it will give you a warning.

So, if you see warnings like this, saying that your profile may be out of date, as you see more and more of them, it’s a good indication to you that’s time to go back and update your profile.

So, let’s turn now and look at, how do you generate the profile?

Xcode 6 has a new command, Generate Optimization Profile.

When you run this command, Xcode will build the special instrumented version of your app and then run it, and you can then interact with the running app to generate the profile.

When it finishes running, it will write out the profile and add it to your project.

As you’re running your app, keep in mind it’s important to exercise all of the code that’s important for your performance.

If I have a game with three different levels and I only play the first level of my game, the compiler’s going to assume that that’s the only thing that really matters and not work as hard on the other levels.

Now, you may be wondering, “If I’ve written a really hard game, it may take a while to play the whole thing to completion.”

That could be a problem, right?

So, Xcode has another option, which is to use your performance tests as inputs to drive the profiling.

Performance tests are a new feature in Xcode 6.

If you’d like to learn more about them, there’s a session right here tomorrow morning on testing in Xcode 6.

And, if you care about performance you want to set up these performance tests anyway, to catch regressions in your code, just to keep track of how you’re doing.

And once you’ve gone to that trouble to set them up, in most cases they’re pretty good inputs for driving this profile.

Again though, keep in mind it’s important that your tests cover the code in a way that reflects the typical usage of your app.

Going back to my three-level game, if I write lots of tests for the first level and only a few for the second and third level, again, the compiler’s going to end up optimizing more heavily for that first level.

Another benefit of using tests is it gives you a great way of evaluating, how much does PGO help me?

You can just run your tests.

Now, let me show you that with a demo now.

So, with the release of the SWF language, we thought it would be fun to make a demo app that would celebrate that.

And so, rather than the SWF language, we thought of the SWF birds and we made an application that uses the Boids Artificial Life Simulation to simulate a flock of SWFs.

And, I can create a whole bunch of them here and let them fly around.

And, the way this Boids application works is that each bird, or Boid, compares its position to all of the other ones on the screen and it calculates the distance between them to find the flock of the birds nearest to it.

And then, each Boid has competing urges.

On the one hand, it wants to move closer to the center of the flock.

At the same time, it doesn’t want to get too close.

And so, if it gets too close to another one it will move apart.

And the performance of that, as we add more and more of these Boids, could become a problem.

So, we set up a performance test to track that, and this is a really simple performance test.

We set up a scene with 200 Boids and measured the time it takes to update their positions 100 times, and that’s our performance test.

So, let’s run that.

Because I care about performance, I’m going to edit my current scheme to make sure that my test step is going to use the release-built configuration so that we get optimized results.

And, I’ll go to the Product Test menu and run my performance test here.

All right.

And now, because I haven’t run the test before I don’t have a baseline, so let’s go ahead and set the baseline based on that first run.

And now, let’s try adding PGO.

Under the Product menu, Perform Action, down at the bottom here is this new command I told you about, Generate Optimization Profile.

I get two choices; I can either run the application or I can use my performance test.

And, I’d like to show you how it works with the performance test.

I just click Build and Run, and Xcode, very helpfully, warns me that I haven’t yet enabled PGO in the Build settings and it offers to do that.

So, let’s go ahead and let it enable that.

It’s now building a special instrumented version of our app and running it using the performance test.

And when those tests finish ah, I got a warning here, an error.

Let me just explain what’s happened here is that because we’ve run the app with a lot of the instrumentation code, it runs more slowly.

But, this is just being used to generate the profile so that’s not a problem.

I’m going to go back to the Project Navigator a minute and show you that Xcode has added this new Optimization Profiles folder.

And inside of that, if you can see it, there’s my profile data.

So, that’s great!

PGO is enabled, we have a profile.

Let’s rerun those performance tests.

We’ll go back to run Product Test, and see how much does it help?

And the tests are running now.

And, wow, we got a 21% improvement just like that.

We didn’t have to change the code or do anything else.

[ Applause ]

So, that is PGO.

It’s a great new feature to help you get even more performance, when you care about getting every last drop out of your code.

Continuing on this theme of performance, I’d like to turn the stage over to Nadav Rotem, my colleague, to talk about advances in vectorization Thank you, Bob.


[ Applause ]

So, Last year with Xcode 5 we’ve introduced a new optimization called loop vectorization.

And, I would like to remind you what loop vectorization is.

So, modern processors have vector instructions.

These instructions can process multiple scalars at once.

And loop vectorization is the compiler optimization that accelerates loops using these vector instructions.

And let’s see how it’s done.

If you can see the code on the screen here, you’ll see that it’s a simple program that accumulates all the numbers in the array into one variable, into sum.

And, the natural way of executing this code is to load one number at a time and save it into the variable sum.

And then, load another number and save it into sum.

But, there’s the better way of executing this code.

What the loop vector does for you automatically, is that it introduces a new temporary variable, temp4.

Now, this is a vector register, a vector temporary variable.

And, this allows us to load four numbers at a time and add four numbers at a time, and we do it for the entire array.

So, this is obviously much faster because we’re processing four numbers at once instead of processing one number at a time.

And, when you finish scanning the array we need to take the four numbers from that temporary register and add them together, but it doesn’t matter because usually an array is pretty big.

So, this is how loop vectorization accelerates loops and makes your code run faster so that you don’t have to change your code.

So, in Xcode 6 we’ve improved loop vectorization in a number of ways, where first of all, we’ve improved the analysis of complicated loops.

This means that the LLVM will be able to analyze more complicated loops and vectorize more loops in your code, which is great.

We’ve also integrated the Loop Vectorizer with PGO, that Bob just mentioned.

So, this means that when PGO is available the Loop Vectorizer will be able to make better decisions when vectorizing your code.

We’ve also improved the X86 and ARM64 in coding support.

Now this means two things.

First of all, the Loop Vectorizer has a better understanding of the processor so it can predict better when it is profitable to vectorize your codes.

And the second thing that it means is that when it vectorize your code it’ll generate better, more optimized code sequences, so that your code would run faster.

And, the last feature that I want to talk to you about is specialization of loop variables.

So, most variables in your code are only known at runtime.

These variables can be arguments or computed expressions, and compiler doesn’t know the values of these variables at compile time, only at runtime.

And in many times, the Vectorizer cannot vectorize your code unless the value of these variables is known to be constant.

So, let’s take a look at the example that I showed you earlier.

So, this is a simple loop and I modified it a little bit and I introduced the Step variable.

So now, instead of consecutively scanning all of the elements in the array, we jump and skip some elements, and we go in step of variable Step.

Now, we can’t vectorize this code because these elements are not consecutive in memory.

We can’t use these vector registers to load a few elements and then add them together.

It’s won’t work unless Step is equal to one.

Well, in many cases Step is equal to one.

So, what do we do?

Well, we’ve introduced a new optimization that’s called Specialization.

What we do is we create multiple versions of the loop.

In one version of the loop we assume that step is equal to one, and then we vectorize the code and make the code run faster.

But, in another version of the loop we don’t assume anything and the code runs as-is scalar.

And then, we add code for selecting at runtime which version of the loop to run.

If Step happened to be one, then we go and execute the vectorized version.

But, if Step is not equal to one then we execute the regular version.

And this compiler, this new feature, allows the Loop Vectorizer to vectorize a lot more loops, and it’s great.

Okay. So, this was loop vectorization.

But, in Xcode 6 we’ve also added a new kind of vectorization.

This is this new vectorizer is not a loop vectorizer.

It’s called SLP Vectorizer, which stands for Superword Level Parallelism, and it extracts parallelism beyond loops.

What this SLP Vectorizer does is that it looks for multiple scalars in your code and it glues them together into vector instructions.

Let’s see how it’s done.

So, on the screen you see a very simple struct.

This struct has two members, x and y.

They’re consecutive in memory.

And, we have a simple function that converts units from feet to centimeters.

Now, this is a very simple conversion.

All we have to do is load the x member, multiply it by a constant, and do it again.

And, we do the same thing for y.

And of course, the natural way of executing this code is to do it consecutively; load variable x, multiply it, save it back.

Load variable y, multiply it, and save it back.

But again, there’s a better way of doing it, and this is what the SLP Vectorizer does.

We can load x and y together because they’re consecutive in memory, multiply them together again, and save them back to memory.

And, this is SLP vectorization SLP vectorization is very beneficial for some kinds of application, mainly numeric applications, and we see great speedups.

It may not speed up all programs, but it definitely speeds up a lot of numerically complex applications.

So to summarize, we’ve improved loop vectorization in Xcode 6 and we’ve introduced a new kind of vectorization called SLP vectorization.

Now in Xcode 5, when we introduced the Loop Vectorizer we did not enable it by default and you had to go into one of the settings and select Loop Vectorization and then Loop Vectorization worked.

Well, in Xcode 6 you don’t have to do anything because both the new SLP Vectorizer and the improved Loop Vectorizer are enabled by default when you build your application in a release mode.

This means that you don’t need to do anything.

Just compile your application in release mode and the improved LLVM will make your code run faster.

Okay. So, we talked about a number of performance features in LLVM.

We talked about PGO, we talked about vectorization, but both of these features are features of a static C and C++ compiler.

But, LLVM is essential technology here at Apple, that’s used by many projects.

And, one of the projects that I want to talk to you about today is accelerating JavaScript code.

Well, WebKit is another important technology.

It’s the heart of the Safari Web Browser.

And, WebKit needs to execute JavaScript code because JavaScript is everywhere in every web page.

And, WebKit has an interpreter, so when you load your Facebook page, or any other page, WebKit starts executing your code with the interpreter.

But, WebKit also has two JIT compilers to accelerate your code.

When WebKit sees that you execute the same function, the same JavaScript functions over and over again, it says, “Huh.

Let’s take a little bit of time to compile it really quickly so that it will run a little bit faster than the interpreter.”

So, this is the fast JIT.

And, when WebKit sees that you execute a function many times, then it says, “All right, let’s also take the time and optimize this function real quick, so that it will run a little bit more faster, a little bit faster.”

So, we have the interpreter, we have the fast JIT, and we have the optimizing JIT, and there are tradeoffs between compile time and the quality of the code.

And this works really great, except that JavaScript is evolving.

People start writing large, compute intensive applications in JavaScript.

People then compile C++ programs into JavaScript and run them in the browser.

You can even compile a Quake3 and run it in your browser today, which is some people like it [laughter].

Yeah, it’s great.

But, it’s a new-use case and we need a new compiler to support this use-case, and this is where LLVM comes into the picture.

So, we’re adding LLVM as a fourth tier compiler to WebKit.

Functions that run many, many, many times are now compiled with LLVM.

And, LLVM is tuned for making the most out of your code, for really trying hard to optimize your code and to generate excellent code quality.

And again, there’s a tradeoff between compiled time and the quality of the code, so WebKit really waits for you to execute that function many, many times as you do in computing intensive applications that you run in the browser.

But, compiling JavaScript with LLVM is very different from compiling C or Objective-C because JavaScript it’s a great language, it’s a dynamic language, and if you look at the code on the screen you’ll see that there are no types.

There’s this n argument here, but what is n?

Is it an integer?

Is it double?

Is it a class?

It can be a lot of different things.

So, how do we compile it?

Well luckily, WebKit executed this function many, many, many, many times before with the interpreter, so it knows that in the last 1000 times n was an integer.

So now, we can compile this code assuming that n is an integer, except that someone made decide to pass an n that’s not an integer.

Someone may decide to pass a double or a class and then everything will break and we can’t allow that.

So, what do we do?

We use a technique that’s very similar to what we did with the vectorizer.

We add checks.

We make assumptions and we add checks.

We assume that n is an integer.

We assume that n does not overflow.

And then, we verify our assumptions at runtime.

Okay, that’s great.

But, what is the fallback?

What do we do?

When our assumptions fail we have to go back to the interpreter because only the interpreter can handle all these cases, all these extreme cases.

But, moving back to the interpreter is not simple because we started executing it in a code and the function made changes.

We can’t just start executing it from the beginning.

So, we developed a technology that’s called On-Stack Replacement, which is techniques that is used to migrate the state of the program from the JITed code in LLVM back to WebKit.

And, LLVM needs to track all of the variables in your program and some of them may be in the register, some of them may be in the stack, and now we’re able to migrate them from LLVM to WebKit and continue the execution in WebKit.

Now, this doesn’t happen all the time, it’s a very extreme case.

But when it happens, we have to handle these cases.

Okay, now compiling code with LLVM is very beneficial, especially for compute intensive applications and especially for these C++ applications compiled into JavaScript, run in the browser.

And, we’re really excited about this technology.

It’s great.

So now, we use LLVM.

So to summarize, we use LLVM as a fourth tier compiler in Safari, both for X86 and ARM64 on iOS and OS X, and we get excellent performance speedups.

To summarize this talk, today we talked about modernizing Objective-C code and we also talked about a number of performance features.

If you have any more questions, you can contact our Developer Tools Evangelist, Dave DeLong or you can go to the Apple website or to the LLVM website.

There are a few related sessions, and I encourage you to attend these sessions or to watch them online.

Thank you, very much, and have a good week.


[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US