What's New in LLVM

Session 409 WWDC 2018

The LLVM suite of compiler tools in Xcode 10 have new language features, improved diagnostics, and more powerful optimizations. Find out about improvements to ARC for Objective-C, keep up with the newest additions to C++, get an overview of new and improved diagnostics and static analyzer checks, and learn about how LLVM compiler technology is delivering faster build times and better runtime performance for your apps.

[ Music ]

[ Applause ]

Good morning, welcome to What's New in LLVM.

I'm Jim Grosbach, your friendly neighborhood pointy hair boss.

I'm here to tell you a little bit of background about the LLVM project before we dive into the deep technical details of all the exciting new things that we have for you today.

Start off, LLVM is more than just the compiler, it is the background for the Clang compiler, for the C family of languages that we all use every day, but it also powers the Static Analyzer, the sanitizers, the LLDB debugger, and is the optimization code generation framework underneath the GPU shader compilers for all of Apple's mobile platforms.

In addition to this, it also powers one additional little project that you may have heard of from time to time called Swift.

And like Swift LLVM is an open source project.

We all operate under the watchful eye of our LLVM wyvern here, he's normally a very friendly fellow, though I do have to caution you he gets a little bit cranky if you call him a dragon so don't do that.

As an open source project LLVM is a partnership, we work with industry partners, academics, researchers and hobbyists from all over the world and in different parts of the industry and many more all over the place.

This is really fantastic, we work together to build the greatest tools that we possibly can to move technology forward.

And if you ever have a compiler itch that you would like to scratch we would like to invite you to participate with us and go to the LLVM website here at llvm.org or you can come talk to us later today later today in the LLVM labs and many of our compiler engineers from Apple will be there and I'm sure will be more than happy to talk your ear off about anything and everything compiler related you've ever wanted to know.

So for today we have a great set of things that we want to share with you.

We have updates on automated reference counting that makes it even easier for the compiler to help you with your memory management.

We have new diagnostics in Xcode 10 and new checks in the Static Analyzer to help catch bugs in your project sooner at build time to improve the quality of your code.

We have compiler features that improve security, both of Apple's platforms and of your apps.

And new features to allow you to take advantage of all of the really great new things on the hardware architectures to get the performance that we all want out of our platforms and architectures.

So with that I would like to invite my colleague Alex up to talk about ARC.

Alex.

[ Applause ]

Thank you, Jim.

Automatic reference counting has greatly simplified Objective-C program since we introduced it a couple of years ago.

A couple of restrictions made it harder to migrate from the old manual retain release mode over to ARC.

I'm happy to say that we've now lifted one such restriction.

Xcode 10 has support for ARC object pointer fields in C structures.

[ Applause ]

Let's take a look at an example let's say we'd like to write a food ordering app and we'd like to create a data structure which represents a menu item.

In Xcode 9 and earlier it would have been impossible for us to actually use a C structure with ARC object pointer fields, so we would have had to use a C, an Objective-C class here.

Xcode 10 now allows us to actually create a C structure that has ARC object pointer fields.

Let's keep going and keep writing our food ordering app.

Let's create a function that orders free food for us.

In the function let's create a variable item of type menu item with a price of zero.

Then let's pass this item into another function that actually orders the food for us.

When the item is created the compiler has to synthesize code which retains the ARC object pointer fields in the item.

The code comments on the slide demonstrate the code that the compiler synthesizes.

This code ensures the name and the price of the item are not released prematurely before the item is actually used.

Now at the end of the function item goes out of scope and is deallocated from the stack so the compiler has to synthesize code which releases the ARC object pointer fields in the item.

This ensures that the name and the price are not leaked when the item is released.

Previously it was possible to use Objective-C object pointer fields when using manual retained release mode, but you had to write the retains and releases yourself.

With ARC the compiler hides all of this complexity for you and synthesizes code that retains and releases the fields.

So the compiler is really your friend here and it does the correct job of managing memory for variables on the stack and also for fields in other structures, and also instance variables inside Objective-C classes.

But there is one place we have to put in a little bit of extra work to support structures with ARC object pointer fields and that place is heap.

Let's go back to our structure, let's say you would like to allocate an array of menu items on the heap.

Now if this was an Objective-C interface we could have used an NSArray here, but it's not so let's use malloc and free.

Now this code actually has two issues.

First issue, the memory is not zero initialized when it's allocated, which means that their pointers will be invalid which will cause undesired runtime behavior for your program at runtime.

The second issue is that the ARC object pointer fields are not cleared before the memory is deallocated which will cause runtime memory leaks in your program.

Now to fix the first issue you can replace the call to malloc with a call to calloc.

This will ensure that your memory is zero initialized, which will remove all of those nasty unexpected runtime issues.

To fix the second issue you can write a loop before it's allocated in your memory to clear out all of the ARC object pointer fields in your items.

This will ensure that the name and the price in the items are not leaked when the items are freed.

Now this is an exciting new feature and if any of you were put off from migrates over to ARC because of lack of features like that I hope that support from ARC object pointer fields in Xcode 10 will help you reconsider your choice.

Now let's take a look at Objective-C pointers and structures in general and see where and how can the structures be used in different language modes in Xcode 10.

So in Xcode 10 you can use structures that have Objective-C object pointer fields across different language modes.

For example, you can use the same structure in C Objective-C or even Objective-C++.

And it will work correctly even when you're compiling your code in ARC or in the manual retain release mode.

In Xcode 10 we actually unified the Objective-C++ ABI between calls to functions that took in or returned structures that had ARC object pointer fields in Objective-C++.

And this was done through an ABI change in Xcode 10 and ABI change affects functions in Objective C++ which return or take in a structure by value that has ARC object pointer fields and no special member functions like constructors or destructors.

Now if you are not sure what this means for you or whether your code is affected by this ABI change please take a look at Xcode's release notes where we describe in more details the effects and the impact of this ABI change.

Now there is one caveat when it comes to the ARC object pointer fields and C structures, they're not supported in Swift.

So if you try to use a structure that has ARC object pointer fields from Swift you will just get a compilation error because the structure will not be found.

In addition to new features like support for ARC object pointer fields Xcode 10 comes with a lot of new compiler diagnostics.

We actually have over a hundred new warnings in Xcode 10 and today I'd like to talk about two of them.

The first warning might be of interest to those of you who have mixed Swift and Objective-C code.

So as you know Swift code can be imported into Objective-C and Xcode allows you to do that by generating a header file that describes the Swift interface using Objective-C declarations.

And you can import this header file into your own Objective-C code to get access to the underlying Swift declarations.

Now let's get more specific and let's talk about how Swift's closure parameters are important to Objective-C.

So right now on the screen you see an example of a Swift protocol called Executor.

This protocol defines a function member called performOperation which takes in a closure parameter called handler.

Now in Swift closure parameters are non-escaping by default, which means that they should not be retained or called after the function returns.

Now it can be easy for the program and to forget that this contract exists when conforming to the executive protocol in Objective-C.

For example, as you see right now on the slide we have a dispatch Executor interface in Objective-C and conforms to the Executor protocol, so it provides the performOperation method which takes in the handler block parameter that corresponds to Swift's handler closure parameter.

But just by looking at the Objective-C code we have no way of knowing whether the handler parameter can escape or not.

Xcode 10 now provides a warning that helps us to remember that this parameter is actually non-escaping.

To fix this this warning you can annotate your block parameter with the NS NOESCAPE annotation.

You should also annotate the implementation of the method or the parameter in the implementation of the method with NS NOESCAPE annotation.

Now the NS NOESCAPE annotation is simply a reminder for you the programmer to ensure that you don't store or call the handler block after they perform operation method returns.

So it's there for you to help you remember that there is this contract that exists between your Swift and Objective-C code.

Now the second warning might be of interest to those of you who work with more low-level code and who care about the way that structures are laid out in memory.

Let's take a look at one structure.

So in C structures have to follow strict layout and alignment rules.

In this particular structure that you see right now on the slide the compiler has to insert a 2-byte pattern between the second and the third field of the structure.

Sometimes you might want to relax these rules and the compiler provides a pragma pack directive that you can use to control the layout and the alignment of your structures.

Now in this example we use the pragma pack push, 1 directive to remove this fixated layout and to ensure that our structure is tightly packed.

This can be useful when serializing your structures or when transferring your structures over the network.

Now pragma pack is typically used with a push and a pop directive, but it can be easy for the programmer to forget to insert the pop into the code.

Xcode 10 will now warn about code that doesn't have a corresponding pragma pack pop directive and to point you to the location of the push.

So to fix this warning you should take a look at the location of your push directive and insert the pop directive at the corresponding location in your code.

So in our case we can insert the pop directly after the packed structure.

Once we do that the new layout rules will apply only to the packed structure so they won't affect any other structures in our program.

These two new warnings that I mentioned are enabled by default in Xcode 10 and they are there to help you write more correct and more robust code.

And to talk more about more correct and more robust code I'd like to invite George up on stage who will talk about the new static analyzing improvements in Xcode 10.

George.

[ Applause ]

Thanks Alex, so I would like to tell you about some of the improvements we have done for Xcode 10 for the Clang Static Analyzer.

So the Clang Static Analyzer is a great tool for finding HK hard-to-reproduce bugs in your program.

And not only the Static Analyzer finds the bug for you it also displays the visualization in Xcode of the paths which [inaudible] the bug.

So here nil is added to NSMutableArray which can cause a crash later on.

And Static Analyzer shows you the path for this crash so you can see how the application can be fixed.

And I would like to tell you about three of the new improvements we have done.

Firstly, we have a new check for detecting Grand Central Dispatch anti-patterning, which can cause poor performance and hangs of your replication.

Secondly, we have a new check for detecting a misuse of autoreleasing variables inside autorelease pools which can cause crashes with [inaudible].

And finally, we have improved performance and visualizations for the Clang Static Analyzer.

So let's start with a new check for detecting Grand Central Dispatch anti-pattern.

So many APIs on our platforms are asynchronous, but sometimes developers would like to use them in a synchronous way for one reason or another.

Maybe because their code is already running on the background queue or maybe because the function cannot proceed at all until the required value is available.

And the tempting solution there is to use a semaphore to ensure synchronization.

So that's what's happening in this example, so here there is an SXPC object self.connection and we use its property remoteObjectProxy to call, to get the current task name asynchronously from a different process.

And then we wait on a semaphore which is signal to inside the callback.

And that helps to ensure that by the time the function returns the task name is available.

So this approach works but has known performance implications.

So the main problem here is when you wait using a semaphore on some asynchronous process you might be waiting on a queue with a much lower priority than yours costing prior inversion which [inaudible] performance and cause hangs.

And moreover using a semaphore in such a way also spawns useless threads which further degrades the performance.

And to help you address this issue now Static Analyzer warns on such cases helping to see where the issue occurs.

Now let's see how the issue can be fixed.

In the best-case scenario there is a synchronous API available which can be used in stat.

So for an SXPC connection there is an [inaudible] API synchronousRemoteObjectProxy which when used in start eliminates the need for the semaphore and runs much foster.

Alternatively, if no such synchronous API is available you could restructure your application to use continuations in stat and just calls the required function inside the callback.

So this check is not enabled by default but we encourage you to enable it in build settings in order to make sure no such problem securing your application and it runs as fast as possible.

Now let's talk about the second check for detecting the autoreleasing variables outliving the lifetime of the autorelease pool.

So the autoreleasing qualifier specifies that the value has to be released once the control exits the autorelease pool.

So here we have an example where we create an error variable inside the autorelease pool and once the control is outside of the autorelease pool the variable is released and subsequently destroyed.

And autoreleasing pools are a useful feature of Objective-C to help contain the big memory footprint of your applications and to ensure that thumbprints are destroyed where necessary.

However, it can cause unexpected crashes and they're even more unexpected because you don't even need to write the word autoreleasing in your application to have those crashes.

So for instance, there is a validation function here and it takes in out parameter NSError.

And out parameters are actually autoreleasing in Objective-C under ARC by default.

So when we write to this out parameter inside the autorelease pool and then the function exits the error value is actually released.

And then if the caller tries to read the value of this error variable they might crash with use-after-free.

That pattern is already hard to detect, but it actually gets even worse when you don't even control the part of the application which has the autorelease pool.

So here is a similar function which [inaudible] and out parameter error and then it calls an enumerateObjectsUsingBlock which is a popular foundation API which calls a block on every element of a collection.

However enumerateObjectsUsingBlock actually calls [inaudible] given block inside the autorelease pool of return.

So a similar problem occurs here that when we create an error value inside the block and write it to the out parameter it will actually get released by the time the control reaches out of enumerateObjectsUsingBlock.

And then when the caller tries to read it they also can crash with the use-after-free.

And previously we have introduced the compiler warning which warns when an implicitly autoreleasing out parameter is captured in the block.

And the compiler warning suggested to make such parameters explicitly autoreleasing.

But we have noticed that such issue kept occurring, so in Xcode 10 we introduced a more powerful Clang Static Analyzer warning which knows which APIs call the provided block inside the autorelease pool and warns about such cases.

So now let's see how this issue can be fixed.

And the simplest fix here is just to introduce a strong local variable and then when you're inside the block write a value into the strong variable in stat.

And then only copy to the out parameter once the control is outside of the block and you know it's not inside the autorelease pool and it's safe to write into the autoreleasing variable.

And finally, we also have improved performance and visualizations of the Clang Static Analyzer.

So in Xcode 10 we have improved the analyzer to explore your program in a more efficient way so now it finds up to 15% more bugs during the same analysis time.

And not only it finds more bugs the bug report it now generates tend to be smaller and more understandable.

And what I mean by that is sometimes in Xcode 10 you would get examples which have a lot of steps and a lot of arrows and which would be somewhat hard to comprehend.

And in many of those examples in your version of Xcode we give you a much smaller error path which is much easier to see and you can see the issue much faster.

So in order to use Static Analyzer on your projects you can use Product, Analyze or you can even enable Analyze During Build to make sure no analyzer issue gets unnoticed.

So I encourage you to use the Static Analyzer, it's a great tool to find your bugs before users do.

And now my colleague Ahmed will talk about low-level improvements.

Thank you George.

[ Applause ]

So as Alex and George told you, we have lots of warnings and Static Analyzer checks in the compiler, but you also have the sanitizer and all of these tools help you find lots of bugs, including security bugs.

So I'm sure you all have lots of tests and use all these great tools to find all the bugs in these tests.

But for some of the most egregious security bugs we want to make sure that they don't happen in release builds if somehow they snuck past all the testing.

So for those we have mitigations in the code generator that are always emitted even in release builds.

So I'm Ahmed, I work on the code generator and today I'm going to tell you about a new mitigation in Xcode 10.

So to see how that works we need to understand how the stack works.

So here I have a simple C function called dlog and I use it to print a string that I'm passed into a dlog bug.

So in this case it's called with a string hello.

And the way this works is we need to allocate some memory to keep track of this call.

So we allocate that into a region called the stack.

So the stack grows down towards the null pointer or address zero.

So when we do our dlog hello call this allocates what's called the stack frame and the stack frame contains things like the return address so that we know to go back to main.

But it also contains other things like parameters and local variables.

So for instance if I have a log file [inaudible] local variable that lives in the stack frame.

So now if I try to make another function call to this dlog file function that in turn will allocate its own stack frame.

And when it's done it's going to deallocate the stack frame and return back to the caller.

So now let's look at this stack frame in more details.

So let's say I change my function to have a local buffer, so it's a 4 bytes character array.

And I'm trying to prepare my debug string by first doing a strcpy of the string that I'm passed into that buffer.

So this does the obvious copy by [inaudible], so it does H-E-L-L.

But then there's a problem at this point we already wrote 4 bytes and that we already exhausted all 4 bytes available in our buffer.

So if we keep going which is what strcpy will do then we're going to override the return address and this is a big security problem.

So if an attacker controls the string that I'm copying which is not that hard, then it can control the return address.

If it can control the return address then they control basically what the program does next, so it's a big security problem.

So if you had a test that caught this and you ran the address sanitizer, then you will have had an easy way to fix this.

And really what I should have done here is strncpy that knows about the size or even better use a higher-level API like NSString or [inaudible] string.

But still sometimes these bugs can survive into release builds and we avoid these by using what's called the Stack Protector.

So the Stack Protector changes the layout of the stack frame to add a new field the canary so that when we do our write we have a little bit of code right before the return of the function that checks whether the canary is still valid.

So if we keep writing in strcpy we're going to override the canary first and then we're going to check the canary first before returning and that's going to abort.

So we turned ad potentially exploitable security vulnerability into a reliable crash and that's not good for an attacker.

So this is what's called the Stack Protector.

It defects certain kinds of stack buffer overflows, which is the attack that we just saw.

And it's already enabled by default in many versions of Xcode.

So next I'm going to talk about a trickier case where we introduced a new mitigation.

So let's say I took my function, again my dlog function and I changed the buffer so that now it's a variable length array.

And the length comes from a parameter called len.

So let's say len in a specific call is something big like 15,000, so now the stack frame has to be at least 15,000 bytes long.

But memory is not all immediately available, so memory is split into pages and the stack grows only when necessary.

So for instance, when we try to access by 10,000 of the buffer that's in the next page of the stack that's not yet available so it's going to do a page fault in the CPU that talks to the opening system, the operating system sees that we have the right to grow the stack, and it grows it and we can continue writing.

So this all happens under the hood.

But say an attacker controls the length and it makes it huge, big enough that it spans many pages.

So now there's a new problem, the memory is not infinite so if we keep allocating in this stack eventually we'll hit another region of memory that's already allocated and usually that's the heap.

And when we do that then we're going to clash with the heap, with whatever is already used in there, so that's usually things like malloc and new.

So if we try to see what would happen with our strcpy example then we will try to write the bytes one by one.

So we do H-E-L, etcetera.

And from the standpoint of the CPU, the code that's generated and the operating system this is all fine because we're just writing into a page that's already available and allocated.

But it really isn't because this is part of the heap, this is not part of our local stack allocated array.

So when we do our writes we're actually overriding some completely unrelated piece of information like I don't know a Boolean that checks whether we should check a password.

So this is another important security flaw.

So this is something that we mitigated with a new feature and the future works by emitting some new codes at the entry of the function that checks whether it's okay to have the stack frame.

So it asks the operating system above the maximum size of the stack and if you try to make an allocation that's bigger than that then it actually aborts.

And again, this turns a potentially exploitable security bug into a reliable crash and that's no good for an attacker.

So this is Stack Checking, it detects something that you might have heard of called Stack Clash and it's enabled by default in Xcode 10.

So next I want to talk about a new set of features we added in Xcode 10 and that's support for new extension, sect extensions.

So as you all know we have lots of great Apple devices and one of the great things about Xcode is that with just a few build settings you can target your code for each of these devices.

And so under the hood in macOS, iOS, watchOS, etcetera we tweak every OS so that it uses everything that's available on a specific piece of hardware.

So it guarantees maximum performance no matter where we run.

And so if you an app with extremely high-performance requirements that's something that you might want to do as well.

So we have three features to talk about that are available in the iMac Pro and the iPhone 8 Plus and X.

And let's start with the iMac Pro.

So the iMac Pro has the Intel Xeon CPU which has a set of new features called AVX-512.

So AVX-512 is a set of new instructions with vector registers.

And these provide benefits over X86-64, so in X86-64 we can only assume that we have 128-bit vectors available, so that's guaranteed on any Mac ever that's Intel powered.

Now it happens that any new Mac today has more than that, but the iMac Pro is the first that has 512-bit registers.

And with the Auto-Vectorizer that's enabled in the Xcode Clang this is great because it means that we can have many more elements in the vector.

So this can greatly improve throughputs.

But there are other benefits with AVX-512, so for instance we not only have bigger vectors we also have more of them.

So on X86-64 we only have 16 now we have 32, so this is a lot of data to process.

And even if for some reason the auto-vectorizer is not able to make use of these vectors then we still have ore skill registers or even for code that just does float or double.

There are lots of performance benefits in AVX-512.

So let's look at how we can exploit it in my compute [inaudible] expensive function.

So the first thing I'm going to do is to keep around my existing function because that's going to be the fallback that I have that runs on all Macs.

Next, I can try to specialize my function.

So one way to do that is using the target attributes.

And that tells the compiler that it's okay to assume that this function has AVX-512, so it only runs on an iMac Pro.

So if you use simd.h, for instance the simd float4 128-bit vector type then now we might have better performance than the AVX-512 version using the same code.

And if you use the even larger vector types, so for instance simd float16, then now you have much better performance than the AVX-512 version where the 512-bit vector is actually native.

And if you go all the way down to X86 intrinsics, then now you can start using the new AVX-512 variance, as well as the M512 types.

So if you want to specialize larger units of codes, so not just individual functions but files, targets, libraries, then you can use the new AVX-512 value of the additional vector extensions build setting.

So when you do that there are some things to keep in mind and if you're familiar with AVX-1 and AVX-2 these are very similar issues.

So you can only pass large vectors, so 256 bits and up from and to AVX-512 functions.

So the ABI is different from the generic and a specialized variance, so you cannot pass them between those.

Additionally, these vectors are large and they're large enough that their natural alignment is too big for what's guaranteed by things like malloc.

So you have to take that into account when allocating these anywhere other than the stack.

And so in general all of these things are things that we are already go through lots of things in the opening system.

So for instance, if you can at all use accelerate.framework and it's much easier to do so because we already specialized all the functions for every single microarchitecture.

So this is AVX-512.

Now we also have new features on the iPhone 8, 8 Plus and X.

So one of the first feature is ARM v8.1 Atomics and that's thanks to one of the great things about the iPhone X and that's the A11 Bionic chip.

So the A11 Bionic chip has one great new feature compared to the A10 which is its support for six CPUs, six cores running all at the same time and that's a first in iOS.

And since you have more cores than you probably have more threads all at the same time and with more threads you might need more synchronization to make these threads cooperate.

And that's implemented using atomics.

So the A11 chip also introduces a new family of atomic instructions that are better optimized for the new extra cores.

So let's look at how that works.

So the way atomics work is through a small sequence of codes.

So suppose I have a thread and it's trying to access main memory, so it has an atomic shared variable in there and it's just trying to increment it.

So under the hood the code generator will emit a small sequence of codes that first takes exclusive excess of a cache line and that's a small region of memory that contains completely this atomic variable.

Now that we have exclusive access we can load from the variable, then we can do our increment on the temporary loaded value and store the result back.

And we know that this is safe because we have exclusive access, so no other thread could have changed the value while we're computing our temporary results.

But now suppose another thread does access either the same variable or another variable in the same cache line.

So both are going to try to have exclusive access over this variable and that is not possible, that's what it means to be exclusive.

So both of them are going to fail their exclusive access and they're going to have to try again until one of them succeeds.

And this is not ideal for performance.

So in ARM v8.1 which is the architecture in the A10 CPU we have new instructions that do this all in a single step and in some cases, that can greatly improve performance.

So again, this is something that you can specialize code for using the per function specialization or for entire targets.

And this is something that's only really useful when you have your own C11 or C++ 11 atomics.

So in general, it's much easier to use the higher-level libraries like GCD or PThread or os unfair lock, etcetera.

So these are already tweaked for ARM v8.1, but they also cooperate to the operating system to have even better performance.

So another feature in the A11 CPU is 16-bit floating points.

So you are all familiar with the two standard floating point types, so we have double which is 64 bits and float which is 32 bits.

So on A11 we also have the 16-bit float16, this has much less range and precision so it's not as useful for as many cases.

But in some cases like machine learning or when you're trying to talk to GPU via Metal this is great because it's smaller and it's faster to compute.

And that's even more true if you put them in vectors where you can put more of them in the same ARM vector.

So this is also something that you can specialize code for and in general something to keep in mind with all of these features is that they're not available everywhere.

So when you want to use them you have to always make sure that they're actually dynamically available on the device you're running and you can do that using sysctlbyname.

And so in general we already do all this in system framework, so it's much easier to just rely on those.

So these are three new instruction set extensions, we have on the iMac Pro AVX-512 and on iPhone X, 8, and 8 Plus we have Atomics and 16-bit floating points.

So that's just part of all the new features in Xcode.

So from ARC object pointers in C Structs to the improved static analyzer there are lots of great things in Xcode 10.

And there are also some things that we didn't even talk about like for instance, over a hundred new warnings and support for C++ 17 standard library function.

So if you want to learn more we have the video and the slides available on the website soon.

And if you're here at the conference come join us at the lab this afternoon.

Thank you.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US