What's New in Metal, Part 1

Session 603 WWDC 2015

Metal provides extremely efficient access to the GPU through its streamlined API and high-performance architecture. Check out what's new in Metal and dive into support for Metal on OS X. Understand enhancements to the Metal memory model and learn how to prepare assets for delivery in Metal-based apps and games.

[Applause]

RAV DHIRAJ: Good afternoon.

Welcome to WWDC 2015.

[Applause]

RAV DHIRAJ: And the first of two What's New in Metal sessions.

So we have a lot of Metal content to cover this week across three sessions.

In fact, we've added so much new stuff to the API, that we've decided to break the What's New in Metal session into two parts.

So today, I'll be focusing on providing a high-level review of the Metal ecosystem over the last 12 months.

Developers like you have already created some tremendous applications using Metal.

I'll then talk about some of the new features that we're introducing this year, and we'll end with a specific example of how Metal integrates well with the rest of the system by describing a technology called app thinning.

In the second what's new in Metal session, Dan Omachi and Anna Tikhonova will provide details on a great new support library that we're introducing in Metal this year or two great new support libraries rather, MetalKit which provides convenience APIs that allow you to create a great Metal application, and Metal Performance Shaders our highly optimized library of shaders that you can call directly from your application.

And finally in the last session, Phil Bennett will dive into great techniques for taking advantage of for extracting the best possible performance out of your Metal applications.

We'll be introducing our new GPU System Trace tool in this session, so be sure to check it out.

We introduced Metal at WWDC last year for iOS 8.

Our goal was a ground-up reimplementation of our graphics and compute APIs to give you the best possible performance on the GPUs in our platform.

So we achieved this by getting much of the software between you and the GPU out of your way.

To better illustrate this let's look back at an example that we showed last year at WWDC that describes the work done by the GPU and The CPU per frame.

In this example the top bar represents the time spent by the CPU and the bottom bar represents the GPU time.

So as you can see, we're currently CPU bound and the GPU is idle for part of the frame.

So with Metal we're able to dramatically reduce the GPU API overhead and effectively make the GPU the bottleneck in the great frame.

So the great thing is that this allows you to take advantage of this additional CPU idle time to make your game better.

You can add more physics or AI for example, or you can issue more draw calls to increase the complexity of your scene.

But we didn't just stop there.

Metal also allows you to move expensive operations like shader compilation and state validation from draw time which happens many thousands of times per frame, to load time which happens very infrequently, and even better in some cases to build time, when your users don't see any impact at all.

Additionally, with iOS 8 we not only introduced compute or exposed compute for the first time on our iOS devices, but we also provided you with a cohesive interoperability story between the graphics and compute APIs allowing to you efficiently interleave render and compute operations on Metal capable devices.

And finally, with Metal, your application is able to make efficient use of multithreading without the API getting in your way, allowing you to encode for multiple threads.

And the results have been stunning.

So last year we showed you Epic's Zen Garden demo where they used Metal to achieve ten times the number of draw calls in the scene.

We also showed you EA's Plants Versus Zombies technology demo where they used Metal to bring their console rendering engine to the iOS platform.

Now this set a high bar for the development community.

And over the last year we've witnessed the release of some truly astounding titles that have taken great advantage of the Metal API.

Titles like the MOBA Vainglory by Super Evil Mega Corp that used Metal to achieve 60 frames per second in their game.

Disney's Infinity: Toy Box 2, Metal enabled them to bring their console, graphics, and game play experience to iOS.

Gameloft Asphalt 8, they were able to improve their gameplay by using Metal to render three times the number of opponents in the game.

But it's not just about games.

With the new version of Pixelmator for the iPhone they're using Metal to accelerate image processing in their powerful new distort tools.

In fact, the response has been overwhelming, with a number of key content and game developers now adopting Metal on OS X.

And much of this content has been enabled by our commitment to bring the leading game console engines to the iOS platform.

This includes Unity, Epic's Unreal Engine 4, and EA's Frostbite mobile engine.

Last year we also showed you how Metal fits in the big picture of how your application accesses the GPU.

On the one side we have our high-level 2D and 3D scene graphs APIs that give you incredible functionality and convenience.

And on the other side with Metal, we provided a direct access path to the GPU.

So this gives you an amazing range to do what's right for your application, and if you choose to use one of the higher level APIs, the great thing is that we can make improvements under the covers and you can benefit from them without us changing a single line of code.

Well this year, we're happy to announce that we've done just that, and we're bringing the power and efficiency of Metal to the system-wide technologies.

We really believe that this is going to improve the user experience on our platforms.

This is also a great year for Metal capable devices.

The iPhone 5s and the iPad Air were the headliners at WWDC last year, and with the introduction of the iPhone 6, the 6+, and the iPad Air 2, we now have an incredible install base of Metal capable devices.

But of course we didn't just stop there.

We're happy to announce that we're bringing Metal to the OS X platform.

We have broad support for Metal across all our shipping configurations.

In fact, Metal is supported on all Macs introduced since 2012.

This of course means that we have support for all three GPU venders: Intel, AMD, and Nvidia.

And the other big news is that we're bringing all the tools that you're familiar with using on iOS to the Mac platform as well.

This includes the Frame Debugger, the Shader Profiler, and all our API analysis tools.

This is huge.

We understand the challenges of debugging complex graphics and compute applications, and think that these will be invaluable in your development efforts on OS X.

And of course, all of this is available in the seed build of OS X El Capitan that you can download today.

So Metal on OS X is the same API you're familiar with using on iOS with a few key additions.

With new APIs to support device selection, discrete memory, and new texture formats, Metal makes it incredibly easy for you to bring your iOS applications to OS X.

And here are a few examples of developers who've done exactly that.

So you heard in the keynote that we've been working with Epic to bring their iOS Metal development code to a Unreal Engine on the Mac.

Well, Epic used Metal and their deferred renderer to create this amazing stylized look in Fortnite.

Additionally folks at Unity brought up their engine and demonstrated their Viking Village demo in only a few weeks.

It's really great to see this content on OS X on Metal.

And we've been working with a number of additional Mac developers to enable them to access the power of the GPU through Metal.

So you also heard in the keynote about digital content creation applications.

Developers like Adobe, they're using Metal to access the GPU to accelerate image processing.

The guys at The Foundry have also been using Metal to accelerate their 3D modeling application MODO.

And here today to talk about their experience adopting Metal in OS X is Jack Greasley from the The Foundry.

[Applause]

Welcome Jack.

JACK GREASLEY: Thank you Rav.

Hi. I'm Jack Greasley.

I'm head of new technology at The Foundry.

And at The Foundry we create tools for digital artists.

Our software is used around the world in games, movies, TV, and film, including some photo real Peruvian bears, mutant monster hunters.

But it's not just about the virtual.

Some of our design customers like Adidas actually make things and if you ask a designer they'll tell you that any product is a result of thousands of little experiments.

We understand this process and we create our tools specifically to support it.

MODO is our premier 3D modeling animation and rendering system.

It is used to make games, films, product design, lots of different things.

Our users create stunning imagery and animations for things both real and imaginary.

In our latest version of MODO 9.01, we revamped out GPU renderer, the aim was to provide a fluid interactive experience with a highest possible quality to designers.

The benefit of this is that if your viewport is realtime you can make tens of decisions in the time it would take to do a single software render.

We had already done some early work with Metal in iOS, but a couple of months ago we got a great opportunity to start working with Metal on OS X.

So we put together a small team and set them a challenge, we gave them four weeks to see how much of the new MODO viewport they could bring over and get running on Metal.

And we almost immediately got some stunning results.

Although it's only a small triangle, it actually represents a huge milestone for us.

Once we did that, we were able to very, very quickly make progress.

And our plan of attack was to really work from the bottom up, and to start bringing the functionality from our new viewport over onto Metal.

So here on day one we started with the environment.

We added a few more triangles.

A little bit of shading started to make this look a little bit more like a real car.

Putting in the soft shadows and specular highlights really added a little bit of bling, and everybody loves shiny.

And so here we are, four weeks later, and you remember that single triangle?

We got some incredible results.

Putting this all back into Metal gave us a fully functional view port running inside of MODO on Metal in just four weeks.

One of the great things for us, is this gives us a standardized renderer across iOS and OS X, we created a WYSIWYG workflow between the two platforms.

So, what did we learn?

The first thing we learned is that working with Metal is fun.

I've spent 20 years working with OpenGL, and I can tell you having a nice lean easy to use API is like a breath of fresh air.

Secondly, the debugging and optimization tools in Metal are absolutely fantastic.

As I have said, if you have done debugging on GPUs, you know why this is important.

Metal can also be really fast.

In some of our tests, we got three times speed-up, and that's using exactly the same data on the same GPU.

Going forward we have some big plans from our new viewport and we're actually looking to integrate it into all of our tools across The Foundry, so hopefully we'll be seeing the Metal cropping up in interesting places very, very soon.

So I'm going to hand you back to Rav, and thank you very much.

[Applause]

RAV DHIRAJ: Thank you Jack.

That was fantastic.

Okay, so I'd like to now talk about the new features that we're introducing in iOS 9 and OS X El Capitan.

And there is a lot of them.

This is a just a selection of the features we've added this year.

Now I don't have time to talk about every single one of these, so I'm going to focus on a subset, including GPU family sets, our new memory model, texture barriers, our expanded texturing support.

Of course, as I mentioned before, you can learn more about MetalKit, Metal Performance Shaders, and our new Metal System Trace tool in the sessions later this week.

So let's dive right into them.

I would like to start with the GPU, our Metal feature sets.

Metal defines collections of features that are specific to generations of GPU hardware.

Metal calls these GPU families.

So a GPU feature set is defined by the platform, iOS, or OS X, the Family Name, which is specific to a hardware generation, and a version that allows us to augment the feature set over time.

It's really trivial to query the feature set, simply call supportFeatureSet on your Metal device to determine if that GPU family is supported.

So here is our iOS feature set matrix.

Now you'll notice that we have support for two major GPU families and versioning to differentiate between our iOS 8 and our iOS 9 features.

On OS X the GPUFamily1 v1 feature set represents the features that we're going to be shipping in OS X El Capitan.

This defines the base for a Metal capable device on the desktop platform.

Now I would like to talk about two new shader constant updates APIs that we're adding.

First a little bit of background.

So for every draw that you encode into command buffer there's some constant data you need to send to the shader.

Now it will be incredibly inefficient for you to have a separate constant buffer per draw so generally most Metal applications allocate a single constant buffer that they have per frame.

They then append the constant data into the buffer as they encode their draws.

So what does the code look like?

Well, first we have some setup for the constant buffer and the data.

Then just like in that diagram, within your draw loop, you send in the new constant data or you pen the new constant data into your constant buffer.

Now it's worth noting that the setVertexBuffer call here is actually doing two things.

It's setting the constant buffer, and it is updating the offset in it.

Now, of these two operations, it's that call to set the constant buffer that's the most expensive.

So Metal now has APIs that allows you to separate these two operations and move that expensive call to set the constant buffer, or the vertex buffer, outside of your draw loop.

If you have thousands of draw calls per frame, this can be a significant savings.

But if you only have a small amount of constant data, it might be more efficient for Metal to manage the constant buffer for you.

So Metal now has the setVertexBytes API, you can use this to append new constants at every draw call.

Actually there is one more thing I want to say about that.

So that API is great if you only have a small number of constants as I said, and that's tens of bytes of constants.

If you have larger constant sets you really want to use on of the other APIs.

There's a good chance that it'll be way more performant.

All right let me talk about the new memory model now.

So, our goal with the new memory model was to support both unified and discrete memory systems without you having to make much change.

Now Metal supports discrete memory now, and that's high speed memory that the GPU on some desktops have access to.

So the way we've achieved this is by introducing new storage modes that allow you to specify where the resource will reside in memory.

So the modes are shared, private, and managed.

So I'll talk about each of these in turn over the next few slides.

Let's start by talking about the shared storage mode.

So this is the mode that's in the existing implementation of iOS 8.

So in a unified memory system, the memory that you use to store a buffer or a texture, is shared between the CPU and the GPU.

There's only one copy of the memory and the memory is coherent at command buffer boundaries.

So this means that you have to just be done with the GPU before accessing it with the CPU.

This makes it very easy to use.

But now new in iOS 9 and in OS X El Capitan we're introducing the private storage mode.

So private memory can only be accessed by the GPU through render, compute, or blit operations.

The advantage of private memory is performance.

Metal can store the data in a way that's more optimal for the GPU to access, by using frame buffer compression for example.

Private storage mode also works really well with discrete memory systems, and you can put your resources into the memory that the GPU has the fastest access to.

And now new in OS X, only, we're introducing the managed storage mode.

With managed memory the resource has storage in both the discrete memory and the system memory, and Metal manages the coherency between those two copies.

Now this gives you the convenience and flexibility of the shared storage mode, and in most cases, the performance of the private storage mode.

And if you have a desktop system with a unified memory system, you don't have to worry about managed having any extra overhead.

There's only one copy of the resource that Metal maintains.

So there are a couple other considerations with managed resources if you're going to modify the data with a CPU or the GPU.

So first, if you modifying the data with a CPU you have to let Metal know by calling the buffer didModifyRange or the texture replaceRegion APIs.

Likewise, if you want to read the data back, you'll need to call the synchronizeResource API.

It's also worth noting that you need to wait for the operation to be complete before you actually read the data with the CPU.

So let's look back at that shader constant update example I showed earlier.

So this example is currently using shared memory.

So in a discrete memory system, you ideally want the constants to be in discrete memory.

Now you could possibly do this with a private buffer, but you'd have to manage the transfer to that buffer.

It's actually a lot simpler to use managed buffers, it makes it really easy.

There is only two things you need to do.

First, you have to specify the managed storage mode when you create the constant buffer, and then, you have to call didModifyRange to tell Metal that you've now updated the constants with the CPU.

And that's it.

The rest of the code remains exactly the same.

It's worth noting that buffers are shared by default on all platforms.

On iOS, textures are shared by default as well, but on OS X we chose to make the default mode for textures managed, because it allows you to write portable code without sacrificing performance.

But there are some cases where you don't want to use a managed texture.

This is one of them.

When you have a frame buffer or a renderable texture, then you want to use the private storage mode to get the best performance.

This is particularly important if the GPU is the only one accessing the data.

And that's our new memory model in Metal.

I'd like to talk about two new features in Metal that are specific to OS X that I think you'll really like.

The first is layered rendering.

So the intent of this API is for you to be able to render to a specific layer of a texture for every triangle that you draw.

So this could be the slice of an array texture, the plane of a 3D texture, or the face of a cube texture.

So on a per triangle basis, you can specify which layer to render into by simply specifying the array index in your vertex shader.

The game Fortnite used this very technique to render into the faces of a cube map for some of their environmental lighting.

So we think you'll find this equally useful.

The second feature that's specific to OS X is texture barriers.

So by default GPUs tend to overlap the execution of their draw calls, and you can't reliably use the output of one draw call in a subsequent one, without some form of explicit synchronization.

Metal now has an API that allows you to insert a barrier between these draw calls.

So this is critical for implementing efficient programmable blending on OS X.

The API is really easy to use.

Simply insert the barrier between the draw operations that you want to synchronize.

And last, but certainly not least, I want to talk about our expanded texturing support in Metal this year.

By default the max limits for all textures in iOS have been increased to 8k.

We've also added cube array support on OS X, and bumped up the quality of anti-aliasing across the board in all our platforms.

We've also significantly flushed out the pixel formats that you can write to, or read from, a compute shader.

Also new is a texture usage property.

So this allows you to tag textures to tell Metal how you plan on using them, and Metal will try to optimize for that usage.

So for example, if you have a renderable texture, you want to set the renderTarget and shaderRead flags.

And this will tell Metal that you plan on both rendering to that texture, and then sampling from it.

By default, the usage is unknown and Metal won't make any assumptions about how that texture is used, allowing Metal to use it anywhere in the system.

Unlike iOS, the desktop GPUs prefer to have a single depth stencil texture, so we've added two new combined depth stencil formats.

The 32-8 format is supported on all our hardware, both iOS and OS X.

The 24-8 format, however, is only supported on some.

So if it means your precision requirements, you'll have to check if it is available.

So let's talk about texture compression.

So the type of compression format you use depends on the device you're targeting and the type of data you're encoding.

On iOS we support a number of these formats including PVRTC, ETC2, and EAC, and new for GPUFamily2, we're also supporting ASTC.

So ASTC has a very high quality compression, better than PVRTC and ETC at the same equivalent size.

It also allows you to encode a number of different formats from photographic content, height maps, normal maps, and many more.

It also provides a very fine-grinned control between size and quality, offering between 1 and 8 bits per pixel.

And at the low end, this is half the storage required for PVRTC.

And finally, as I previously noted, this is only available on GPUFamily2 capable devices, so you'll have to look out for that.

Finally, in OS X we're introducing all the native texture compression formats that the desktop GPUs support.

Now, these BCn formats should all be familiar to you.

If you've worked on a desktop platform or game console before, you likely have assets that are already in this format.

And that's our expanded texturing support and the features that I'm going to cover today.

So I'd like to change topics and talk about a new technology called app thinning.

You might have heard about it in the last talk.

So this is not specifically a Metal feature, but it does rely on the GPU families I discussed earlier in the session.

First, to set context, the typical game development and deployment flow for a developer on our platform looks something like this.

You generally have an art pipeline that generates some assets.

The assets are built via Xcode or your custom tools pipeline into a binary and then that binary is sent up somewhere to the App Store, and that specific or that very same binary is deployed to the devices of all your users.

So this works great.

But as soon as you start having assets that are specific to the capability of a device, you start running into some issues.

For example, if you have assets that are specific to Metal devices, and assets specific to legacy devices, you currently have to download both versions to all your users' devices.

So obviously this is not ideal.

App thinning let's you solve this problem by allowing you to tag assets by capability and then only the asset that's needed for the device is actually downloaded to the device.

So how do we do this?

Well, app thinning allows you to define capability across two axis, GPUFamily version and device memory size.

Now this creates a matrix that you can then use to target specific devices.

So let's look at a typical normal map example.

Ideally you want to store the normal maps compressed, and EAC is a great format for that.

But since some legacy devices don't support compressed textures or EAC in particular, you probably want to have an uncompressed version of the asset as well.

So app thinning allows you to tag these assets and only download the compressed one to the Metal capable device, and the uncompressed version to the legacy device.

But app thinning is actually a lot more capable than this.

So let's extend our example.

And that's support for more devices.

So in this particular case we're going to create five assets.

We'll start with a high resolution ASTC version for our most capable 2GB device.

And then we'll include a slightly lower resolution for the 1GB version of that device.

And since some Metal capable devices don't support ASTC, we'll include the EAC version as well.

And then, for your legacy devices, we have the uncompressed version of the asset.

And we can extend this example even further by including a lower resolution version of that uncompressed asset for our least capable device, the 512MB configuration.

So you probably don't want to create five assets of everything, but the point I'm trying to make here is that you have tremendous amount of flexibility to target specific devices and create the best experience possible for your users.

So Xcode integrates a great new UI that allows you to tag assets in this way.

The first thing you need to do, is define the capabilities of the devices that you'd like to target.

That creates that little matrix, and then all you have to do is drop in the assets that match the relevant intersection of GPUFamily and device memory size that you're trying to target.

It's that simple.

But of course, we realize that not all developers have tools pipelines that exist in Xcode, so we have you covered there as well.

We're supporting with app thinning the JSON file format that lets you specify your asset catalogues.

So just like in Xcode, you have to specify the GPUFamily version and the device size for each asset you want to include in your catalog.

So once you have your asset catalogs defined, how do you retrieve the data at runtime?

The answer is the NSDataAsset class, it provides the resource matched to the capabilities of the device that you're running on.

Using the NSDataAsset is really easy.

Simply allocate an NSDataAsset object using the name that you assigned in your asset catalog, and then use it in your data.

So let's tie this altogether using the diagram that I originally showed and the normal map example.

So in this case, your artist will create a bunch of normal maps, some compressed, some uncompressed to target the devices you'd like.

This gets built via Xcode or your custom tools pipeline into your binary, big massive binary with lots of assets in it, gets uploaded to the App Store, and then the great thing is only the normal map required by your user, is downloaded to their device.

And that's app thinning.

We think that this is going to change the way you create and deploy content on Metal capable devices.

So that was a whirlwind tour of the Metal ecosystem over the last 12 months.

We've seen developers like you create amazing content using Metal.

We have brought Metal to OS X.

We've also brought all our great Metal GPU tools to OS X.

We've introduced some powerful new APIs that we think you're going to love.

Finally, we also talked about how Metal integrates well with a system, by talking about app thinning.

All in all, it's been a great year, and we're really looking forward to seeing what you're going to be able to build with Metal over the next one.

So please visit our online documentation, and you'd also like to go to our support forums, and if those don't answer your questions, you're of course free to reach out to Allan Schaffer our Game Technologies Evangelist.

We have two more sessions this week, What's New in Metal, Part 2 on Thursday morning, and Metal Performance Optimization Techniques, which is on Friday.

Please be sure to visit those.

Thank you very much.

[Applause]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US