Adopting Metal, Part 2

Session 603 WWDC 2016

Building on the fundamentals, dive into the specifics of constructing games and graphics apps with Metal. Learn about scene management and understand how to manage and update Metal resources. Understand the rendering loop, command encoding, and multi-thread synchronization.

[ Music ]

[ Applause ]

Hi, everyone, and welcome to WWDC.

I hope you're all having a good time so far and you've had some nice sessions you've seen.

We've got a great week for you guys.

It's going to be really fun.

I'm Matt Collins.

This is my colleague Jared Marsau and we're here to talk Adopting Metal, Part 2.

This is Section 603.

So if you're in the wrong place, you get to see some graphics.

So let's recap.

We have two Adopting Metal Sessions.

Hopefully you were here for Warren's presentation a little bit ago, where we talked about the fundamental concepts: Basic drawing, lighting, texturing, good stuff like that.

And in this presentation we're going to take it to the next level.

We're going to draw many objects.

We're going to talk about managing dynamic data, large amounts of dynamic data, GPU-CPU synchronization, and we'll cap it off with some multithreaded encoding.

Tomorrow we've got some great presentations.

We'll talk about what's new in Metal.

We'll have the first session, tessellation, resource heaps, memoryless frame buffers, and some stuff about our improved tools to really help you guys get the best out of your apps.

Part 2, we'll talk about function specialization and function resource read-writes, wide color and texture assets, and additions to Metal performance shaders.

And if you really want to dig in heavy, we'll have an awesome talk about advanced shader optimization, shader performance fundamentals, tuning shader code, more detailed about how the hardware works.

It'll be great.

So if you're really interested in tuning your shaders to make them to best they can be, check out that tomorrow.

So this is Part 2 of Adopting Metal and we're going to build on what we learned in Part 1.

We figured out how to get up and running.

So let's take a look at the concepts that you need to get the most out of Metal in a real-world situation.

We've got a demo that will draw a ton of stuff in a simple scene and we'll use that demo for context during today's session as we discuss and learn a couple lessons from it.

We'll talk about the ideal organization flow of your data, how to manage large chunks of dynamic data, the importance of synchronization between the CPU and the GPU, and, like I said before, some multithreaded encoding.

So hopefully you're familiar with the fundamentals of Metal because we won't be going over them again.

So we expect that you understand how to create a Metal queue, a Metal command buffer, how to encode commands, and we'll build on that to go forward.

So let's start with the demo itself and see what we're aiming towards.

So right now we've got 10,000 cubes and they're all spinning around, loading in space.

It's an interesting scene.

Metal allows us to issue a ton of draw calls with very low overhead.

So here we have 10,000 cubes and 10,000 draw calls.

You can see on the bottom there's a little shadow.

We're using a shadow map, playing on the bottom, some nice anti-aliased lines give you some depth cues, and of course all of our cubes.

So what goes into rendering a scene like this?

As you can see, we've got a lot of objects and each of these objects has its own associated piece of unique data.

We need the position, rotation, and color.

And this has to update every frame because we're animating them.

So this is a bunch of data that we're constantly changing, constantly have to reinform the GPU what we're drawing.

We can also draw a few more objects, maybe a little more.

You can spin it around a little bit and see that we're actually floating in space.

So we have a draw call for cube and a bunch of data for cube and we have to think about the best way to think about this data, how to manage it, and how to communicate it to the GPU.

So let's dive right in.

Thanks, Jared.

Managing Dynamic Data: This is a huge chunk of data that's changing every frame.

And as you can imagine in a modern app like a game, you also have a bunch of data that every frame needs to be updated.

So our draw basically looks like this.

We want to go through all the objects we're interested in drawing and update them.

Then we want to encode draw calls for every object and then we have to submit all these GPU commands.

We have a lot of objects.

We started at 10,000 and we were cranking it up to up to 100, 200,000.

Each of these objects has its own set of data and we have to figure out the best way to update this.

Now in the past, you might've done something like this.

You push updated data to the GPU, maybe uniforms or something, you bind a shader, some buffers, some textures, and you draw.

And you push some more data up.

You bind shader, buffers, textures.

You draw your next object.

In our scene we repeat this 10,000, 20,000 times, but we really want to get away from this sort of paradigm and try something new.

What if we could just load all our data upfront and have every command that we issue reference the data that was already there.

The GPU is a massively powerful processer and it does not like to wait.

So if all our data in already in place, we can just point the GPU to it and it will go happily crunch away and do all our rendering for us.

And each draw call we make then references the appropriate data that's already there.

In our sample, it's very straightforward.

We have one draw that references one chunk of data.

So the first draw call references the first chunk of data, the second, the second chunk, and so on.

But it doesn't have to be that way and we can actually reuse data.

We have some data, like at the front here, frame data, that we can reference from all our draw calls or we could have a draw call that references two pieces of data in different places.

If you're familiar with instancing, it's a very similar idea.

All your data will be in place before you start rendering.

So how do we do this in Metal?

In our application, we create one single Metal buffer and this is our constant buffer.

It holds all the data that we need to render our frame.

We want to create this upfront, outside of the rendering loop, and reuse it every time we draw.

We don't duplicate any data.

Again, any draw call can reference any piece of data, so there's no need for duplication.

Each draw call will reference an offset into the buffer.

It'll do a little bit of tracking to know which draw represents which offset.

And then you'll just draw with everything and everything will be in place.

Let's take a look at the code for this.

Here's the code from the app.

You can think of us as having two sets of data.

Like I mentioned before, there's a set of frame data that will update here and there's a set of data that will change per object.

This is the unique rotation position, et cetera.

So we need to put both sets of data in place.

Now what do I mean by per-frame data?

Well this is data that is consistent across every draw call we make.

For example, in our sample we have a ViewProjection matrix.

It's a 4 by 4 matrix, very straightforward, if you're familiar with graphics.

It represents the camera transform and the projection.

This is not going to change throughout our frame, so we only need one copy of it.

And we'd like to reuse data as much as we can so we can create one copy and put it into our buffer.

Let's start filling this out.

So here, we have our constant buffer, which is just a Metal buffer we've created.

And with the Contents function, we have a pointer to it.

Our app has a helper function, which is GetFrameData, and this returns that main pass structure I just showed you that has the view transform in it, the ViewProjection transform.

Excuse me.

And then we simply just copy this into the start of our buffer and then we're in place.

So our buffer will look like this.

We'll have a MainPass with the appropriate data for our frame and we'll put it at the start of our giant constant buffer.

So now we have all this empty space afterwards.

And like we saw, we need to do 10,000, 20,000 draw calls, so we need to start filling this out with a ton of information.

So then we have a set of per-object data and this is the unique data we need to draw a single object.

In our case, we have a single LocalToWorld transform, which is the concatenation of the position and the rotation and we have the color.

So this is the set of data we need per draw call.

So we'll walk through every object we want to render.

We'll keep track of the offset into the buffer.

We have our updateData utility function, which will do our little update for our rotation, and then we'll update the offset.

This will pack our data tightly and we'll fill it out as we go through.

Let's take a closer look at what updateData looks like.

It's quite simple.

Now, animation is kind of out of the scope of this talk, so I have a little helper function here that's updateAnimation with a deltaTime.

This could be whatever you want in your own application and indeed you should but depending on what sort of animation you need.

But it my case it returns an objectData object which has the LocalToWorld transform and the color.

And just as I did before, I copy it into my constant buffer.

So here's what that looks like.

I've got my frame data in place.

I have my other data, another piece, and another piece.

So all our data is in place and we're ready for rendering.

But are we missing anything?

Turns out that we are and I want to bring your attention to this.

We have one constant buffer.

I mentioned I created one Metal buffer and I was reusing it.

Now there's a problem with this.

The CPU and the GPU are actually two unique parallel processors.

They can read and write the same memory at the same time.

So what happens when you have something reading to a piece of memory while something else is writing to it?

Resource contention.

So it looks a little like this.

The CPU prepares a frame and writes it to a buffer.

The GPU starts working on this and reads from the buffer.

The CPU doesn't know anything about this, so it decides I'm going to prepare the next frame and it starts overwriting the same data.

And now our results are undefined.

We don't actually know what we're reading to, reading from, or writing to or what the data state will be.

So it's important to realize in Metal, this is not handled for you implicitly.

The CPU and GPU can write the same data at the same time however they'd like.

You must synchronize access yourself.

It's just like writing CPU code that's multithreaded.

You have to ensure you're not stomping yourself.

And that brings us to CPU-GPU synchronization.

Let's start simple.

The easiest way to do this would to just be to wait after you've submitted commands to the GPU.

Your CPU draw function does all of its work, submits the commands, and then just sits there until it's ensured the GPU is done working.

That way we know we won't ever override it because the GPU will be idle by the time we try to generate our next frame.

This won't be fast but it's safe.

So we need some sort of mechanism for the GPU to let us know, hey, I'm done with this, go do your thing.

Metal provides this in the form of callbacks.

We call them handlers and there are two of them that are interesting, addScheduledHandler and that executes when a command buffer has been scheduled to run on the GPU.

And for us, an even more interesting one is the completion handler and this is called when the GPU has finished executing a command buffer.

The command buffer is completely retired and we're ensured at this point it's safe to modify whatever resources that we were using there.

So this is perfect.

We just need some way to signal ourselves that, hey, we're done, we can go forward.

Now how many of you are familiar with the concept of a semaphore?

Anyone? Pretty good.

Quick background on semaphores.

They are synchronization primitive and they're used to control access to a limited resource and that fits us perfectly here.

We have one constant buffer and that's a limited resource, so we'll have a semaphore and we'll create it with a value of 1.

The count on a semaphore represents how many resources we're trying to protect.

So we'll create our semaphore.

And again, this is something that should be created outside of your render loop.

And the first thing we do once we start to draw is we wait on the semaphore.

Now in Apple semaphore, we call it waiting.

Some people call this taking.

Some people call it downing.

It doesn't really matter.

The idea is that you wait on it and our timeout we set to distant future, which effectively means we'll wait forever.

Our thread will go to sleep if there's nothing available and wait for something to do.

When we're done, in our completion handler we will signal the semaphore.

That'll tell us that it's safe to modify the resources again.

We're completely done with it and we can go forward.

So this is sort of a naive approach to synchronization but it looks a little like this.

Frame 0 we'll write into the buffer.

And on the GPU, we'll read from the buffer.

The CPU will wait.

When the GPU is done processing Fame 0, it will send the completion handler and frame 1 will work and create another frame on the CPU.

And that will process on the GPU and so on.

So this works but, as you can see here, we have all these waits and both the CPU and GPU are actually idle half the time.

It doesn't seem like a good use of our computing resources.

What we'd like to do is overlap the CPU and the GPU work.

That way we can actually leverage the parallelism that's inherent in this system, but we still need to somehow avoid stomping our data.

So we'd like our ideal workload to look like this.

Frame 0 would be prepared on the CPU, pushed to the GPU.

While the GPU is processing it, the CPU then gets to work creating frame 1 and so on, and again.

So one thing to keep in mind here is that the CPU is actually getting a little ahead of the GPU.

If you notice where frame 2 is on the CPU, frame 0 is the only thing that's done on the GPU.

So we're a little bit ahead and I want you to keep that in mind for a little later.

But first let's talk about our solution in the demo and what we do here.

We'd like to overlap our CPU and GPU but we know we can't do it with one constant buffer without waiting a lot.

So our solution is to create a pool of buffers.

So when we create a frame, we write into one buffer and then our CPU proceeds to create the next frame while writing into another buffer.

While it's doing this, the GPU is free to read from the buffer that was produced before.

Now we don't have an infinite number of buffers because we don't have infinite memory.

So our pool has to have a limit.

On our application, we've chosen three.

This is something that you need to decide for yourself.

We can't tell you what to do because there are a lot of things that go into the latency consideration, how much memory you want to use.

So we recommend you experiment with your app what fits for you.

For this example, we've chosen three.

So here, you can see we've exhausted our pool.

We have three frames that have been prepared but only one is finished on the GPU.

So we need to wait a little bit.

But by now, frame 0 is done, so we can reuse the buffer from the pool and so on.

So let's look at this in code.

Here's synchronizing access to constant buffers.

We've already got a semaphore and they're great for controlling access to limited resources.

In this case our limit is three but it can be whatever you'd like.

So here we create our semaphore with our count.

And instead of creating one constant buffer, we now create an array of them.

And lastly, we need an index and we'll use this index to represent the currently available constant buffer for us to use.

We can walk through the array and wrap around and the semaphore will control our access and protect us.

So in our draw function, we'll immediately wait on the semaphore, and if there's nothing available, we'll go to sleep.

Once we've taken the semaphore and proceeded, we know it's safe for us to grab the current constant buffer.

In our index, current constant buffer is tracking which one's available.

Then we fill out our frame as normal, encode all our commands, do all our updates, add the completion handler, and then we'll signal the semaphore, saying, hey, we're done with this frame.

You can go forward.

And the last thing we need to do is update the index.

We'll add one.

We'll use modulo to wrap around.

And don't worry, we don't have to worry about overwriting ourselves because the semaphore will protect us.

So constant buffers in the demo.

The demo has an array of three buffers and I've seen some applications track buffers by marking them as, oh, this is being read from in frame number 7, this is written to you in frame number 5.

But with this model you don't actually have to do that.

The semaphore takes care of all the synchronization for you.

And if you can take the semaphore, you're guaranteed that the last frame that was using that was done, otherwise you'd still be asleep.

So now all our data is in place and it's protected.

And we'd like to start issuing a bunch of draw calls to get some stuff on the screen.

So here's the basic rendering loop for our demo.

We have two passes: One pass that draws a shadow map and one pass that reads the shadow map, and we've decided to split these into two separate command buffers.

There's a good reason for this.

It lets us have two encoding functions that are independent and unique.

They don't depend on each other.

You encode the shadow pass.

You pass that to command buffer and the constant buffer that you've already filled out and it encodes all the commands to render the shadow map.

And then you have a separate encoding function that encodes the main pass.

You pass it to mainCommandBuffer and the other data you need and it encodes all those other commands.

When the encoding is all done, you call commit on your two command buffers, push them off, and then you've got your frame.

So what goes into actually encoding drawing one of our cubes?

We need a bunch of data and not just the rotation data.

We need some geometric data for the cubes, which is quite simple, you know, think about a cube is what, eight vertices, maybe an index buffer.

And in our sample, we don't really have complex materials or anything, just some very simple Lambert shading.

So we could reuse that pipeline state object across all of our cubes.

We mentioned the per-frame data earlier.

We need one copy of that.

So we'll update it.

Stick it in place.

And then of course we need the per-object data, that LocalToWorld and the color information that we're animating.

So when we issue our draw calls, we want to make sure we reference the correct data.

So our encoder will produce commands, put them into our command buffer, draw call 0 will reference both the frame data and the object that we're interested in.

Draw call 1, similarly, will reference the frame data and the object 1 data and so on.

This way everything's in place.

We issue our calls and the GPU will start crunching away.

Now we have a ton of draw calls to issue.

You know, in our demo, it was minimal, 10,000, and we want to issue these as efficiently as possible.

So we'd like to avoid doing redundant work.

We don't want to reset everything every draw.

Anything that's shared, geometry, pipeline states, we'd like to set that once and leave that in place.

So avoid redundant state updates and avoid redundant argument table updates.

It's also worth keeping in mind that the vertex and fragment stage argument tables are completely separate.

You can bind a buffer to the vertex stage and not to the fragment stage or vice-versa.

But if you have to bind everything to both stages, this can potentially double the calls you call the setVertexBuffer, setFragmentBuffer.

This is one reason we didn't use set vertex bytes in our example.

You can imagine we have 50,000 objects and we had to make a copy of all that data twice, once for the vertex stage and once for the fragment stage.

That would quickly get really big.

But if we kept it all in one buffer and just referenced it, we wouldn't have to worry about that.

And the last guideline I want to point out is using a new function, setVertexBufferOffset/ setFragmentBufferOffset.

This merely changes the pointer into one of your buffers.

So you can see here when you call these, they actually don't take a reference to a Metal buffer.

They take an offset and an index.

This is because you must have already set the buffer to that specific point and this just changes the pointer within it and that's perfect for what we want.

We have one constant buffer and we're just walking through it.

So we can set it once in the beginning and then every time we draw, we call setVertexBufferOffset and just point the next draw call to the current spot in our buffer.

It looks a little something like this.

We bind this constant buffer and then we call setVertexBufferOffset with this offset.

Then we call it again striding it forward and again striding it forward.

We're not changing the buffer that we've set to this index.

We're just changing the offset within that buffer.

With these guidelines in mind, our encoding is actually pretty simple.

We have a bunch of data we can set up front.

The per-frame constants is pretty obvious because we know we're not going to change it.

So we'll set that.

We'll set the constant buffer once because we know it has to be in place for us to use the setVertexBufferOffset function.

We'll set the geometry buffer and the pipeline state because we know they're shared across all of our cubes.

Then finally we can start looping through all the objects we want to draw.

We'll set the offset into the constant buffer for our current draw.

And then we'll actually issue the draw.

And here's the code from the encode main pass function in the sample.

We'll start off by setting the vertex buffer that is our geometry and the render pipeline state, which is our litShadowedPipeline.

We'll set the constant buffer so we can use setVertexBufferOffset later.

In this case we're setting it to both the vertex and the fragment stages.

And then we'll set the per-frame data.

Now you'll notice here that I've set the constant buffer to two separate indices with different offsets.

And Metal allows you to do this as much as you want.

You could set the same constant buffer to every index at a different offset if you'd like, completely up to you.

And then we dive right into our loop.

We need to track the offset because we know that we're not starting right at the beginning of our constant buffer.

There's some frame data in there.

So the offset will be pushed back past the frame data.

Then we'll call setVertexBufferOffset and setFragmentBufferOffset to point this draw to the correct data that we want to draw with.

We'll issue the draw call and then we'll set the offset again just striding one object data struct at a time.

So our draws are in place.

This is still very linear.

And I promised you some multithreading and Warren mentioned that, hey, you can actually encode a bunch of stuff in parallel in Metal.

So how would you do this?

An ideal frame might look like this.

Our render threat is chugging along and it realizes, hey, I need to render a shadow map and I need to render a main pass.

It'd be great if I could code this in parallel.

I've got multiple CPUs.

So what if I dispatch this work out, encoded some stuff, then I rejoin back to the render thread and the render thread pushed this over to the GPU to do a bunch of work.

This would look great.

How many of you have used GCD?

This is a great fit for Grand Central Dispatch.

If you're not familiar, Grand Central Dispatch is Apple's multiprocessing API.

This is an API that lets you create queues and these queues manage computing resources on your machine.

There are two types of queues you can create.

There's a serial queue.

When you dispatch work through a serial queue, you're guaranteed that all that work will happen in order.

But what's more interesting for us is the concurrent queue.

When you dispatch work to the concurrent queue, GCD will look at your system and figure out the best way to schedule this for you.

And that's perfect.

We have two jobs we need to do in parallel.

So if we created this one queue and just pushed the work to it, it would do that for us.

This is another object you want to create once and reuse.

So here's some code to create a concurrent dispatch queue.

You should always a label on your queues.

I've used the very creative label queue here but you might want to call it something else.

So we made some modifications to the code.

We still create the command buffers at the start.

But since we were smart enough to use two command buffers and separate our encoding functions into two unique things, there isn't much else for us to do other than dispatch the work.

So dispatchQueue.async is the main call you use to dispatch work to a queue in GCD.

This is an asynchronous call.

It'll push the work on and your thread will keep going.

So here we dispatch the shadow pass and then we dispatch the main pass.

We'll want to commit this work somehow so we call dispatch barrier sync and this makes sure that all the work is done by the time we get to this point.

And then finally we've rejoined and we can commit our work.

Now the ordering is important here.

The shadow map has to be done by the time we reference it.

So we have to commit the shadow command buffer first and then the main command buffer later.

There's something else I want to bring up here.

How many of you are familiar with the concept of a closure?

Great. How many of you have ever had an issue where closures captures self and you thought you were referencing something else?

You can be honest.

It's happened to all of us.

I just wanted to call this out.

Closures capture self.

So if you're referencing a member variable or an iVar within them and you're not explicitly saying self.iVar, it's still actually going to reference that variable.

So if you want to make sure you're going to reference the correct data, it's a good idea to capture it outside and I'll show you what I mean in a second.

These two things don't do the same thing.

So in the first one where I encode the shadow pass, you can see the constant buffer I'm grabbing is dependent on self.constantBufferSlot.

I don't actually know what that will be at the time it executes.

This is really asynchronous programming.

So by the time my dispatch is actually running, this could've changed behind my back.

It may be right but it may not be.

I can't guarantee it.

So keep that in mind and don't do it that way.

Instead, we'd like to capture a reference to the constant buffer we're interested in.

So here we just say let constant buffer and grab it out of the array.

But then when we issue our dispatch, we reference the specific one that we've already grabbed.

That makes sure we know exactly what data we're reading from.

So this is some multithreading fun.

The actual code in the sample looks like this.

We capture the constant buffer.

And when we use it, we make sure we're using the correct one, the one that we've captured already, to know that we're using this frame's constant buffer.

Now I had mentioned the ordering earlier and how this was important.

When you create a command buffer and you commit it, the ordering that this executes on your GPU is implied by the order you commit it in.

So if I commit the shadow command buffer first and the main command buffer second, I'm guaranteed that the shadow one will happen first on the GPU followed by the main command buffer.

Sometimes we refer to this as implicit command buffer ordering.

But you can be a little more explicit about it.

Metal provides an enqueue function that enforces command buffer ordering.

If you have a set of command buffers, you can enqueue them and you're guaranteed that they will execute in that order regardless of how you commit them or when you commit them.

This is something really cool because it allows you to commit command buffers from multiple threads, in any order, and you don't have to worry about it.

The runtime will ensure you're executing in the correct order.

So let's see how to apply this to our code.

A couple new additions here.

Now when we create our command buffers, we immediately enqueue them in the order.

Again, the order matters, so we still have to enqueue shadowCommandBuffer first and then mainCommandBuffer second.

But now when we dispatch, we can actually commit from within our other thread.

Again, the runtime is going to ensure the ordering.

So we don't actually have to worry about it.

This actually lets us remove that barrier we had before because we have no need to rejoin and commit the command buffers.

They're already committed for us.

But I seem to have skipped over all that synchronization stuff I talked about a second ago and we still need it because we're still going to be overriding ourselves if we don't have it.

So can we apply these same synchronization lessons to this sort of multithreaded world?

It turns out we can and it's actually quite straightforward.

We bring back our friendly semaphore and our array of constant buffers.

And again, don't forget to grab the correct one that you want.

At the start, we'll wait on the semaphore and sleep if nothing's available.

We've enforced our ordering with enqueue and we push it through.

Now we know that mainCommandBuffer is the final command buffer in our frame.

And we know that we want to signal that our frame is done.

So we should add our completion handler to the mainCommandBuffer and you could do this from within the dispatch.

So the mainCommandBuffer is the final command buffer.

We add the completion handler to it, to signal our semaphore, and we commit it from within the dispatch, just like we did before.

Now you may notice here that I'm referencing self.semaphore and a second ago I just told you to watch out for that.

So what's going on?

Well it turns out a semaphore is a synchronization primitive and we do actually want to be looking at the same one as all of our other threads.

So we want the value of the semaphore at the time the thread is executing.

So in this case, we actually want self.semaphore, something to keep aware of.

And here's the recipe for our rendering.

At the start of our render function, we wait on the semaphore.

We select the current constant buffer.

We write the data into our constant buffer that represents all of our objects.

We encode the commands into command buffers.

We can do the single-threaded, multithreaded, however you'd like.

We add a completion handler onto our final command buffer and we use it to signal the semaphore to let us know when we're done and we commit our command buffers.

And the GPU takes all this and starts chugging away at our frame.

So let's look at the demo again and see what this got us.

So here you can see in the top left, this is single-threaded encode mode and you can see how many draws we're issuing, 10,000.

And the top right, you can see the time it takes us to encode a frame.

So here we've got 5 milliseconds and we can crank the number of draws up and see that it starts costing more and more as we draw things.

Now this is single-threaded mode.

And when you think about it, we're drawing a shadow map, which means we have to issue 40,000 draws in the shadow map, and then we're drawing the main pass, which means we have to issue another 40,000 draws to reference that.

But again, we can do this in parallel, so we've added a parallel mode to this demo.

And you can see how it's faster to go through.

Now take a look at everything that's going on.

You can fly around a little bit.

So here we have 40,000 cubes, unique, independent.

They're all being updated.

We're using GCD to encode a bunch of stuff in parallel.

We have two command buffers: One to generate the shadow map on the ground and one to render all of the cubes in color.

The lighting is quite simple, Lambert shadowing, which is basically what Warren talked about earlier, the N.L lighting.

And that's our demo.

This will be available as sample code for you guys to take a look at.

Hopefully you can rip it apart, take some of the ideas and the thoughts in it and apply them to your own code.

So what did we talk about today?

When you walked in here, hopefully you came to Warren's session earlier and maybe you knew a little bit about graphics or had done some programming before, but we took you through everything in Metal.

The conceptual overview of Metal, the reasoning around it is to use an API that is close to the hardware and close to the driver.

We learned about the Metal device, which is the root object in Metal that everything comes from.

We talked a bit about loading data into Metal and the different resource types and how you use them, the Metal shading language, which is the C++ variant you use to write programs on the GPU.

We talked about building pipeline states, prevalidated objects that contain your two functions, vertex and fragment or a compute function, and a bunch of other baked-in, prevalidated state to save you time at runtime.

Then we went into issuing GPU commands, creating a Metal queue, creating command buffers off that queue, and creating encoders to fill the command buffer in, and then issuing that work and sending it over to the GPU.

We walked you through animation and texturing and using set vertex bytes to send small bits of data to do your animation in.

Then when the small bits of data weren't enough, we talked about managing large chunks of dynamic data and using one big constant buffer and referencing it in multiple places to get some data reuse out of the system.

We talked about CPU-GPU synchronization, the importance of making sure your CPU and your GPU aren't overriding each other and playing nicely.

And then lastly, we talked a little bit about multithreaded encoding, how you can use GCD with Metal to encode multiple command buffers on your queues at the same time.

And that's adopting Metal.

Hopefully you enjoyed the talk and you can apply some of these to your apps and make your apps even better than they already are.

If you'd like some more information, you can check out this website, developer.apple.com/wwdc/603.

We have a few more sessions tomorrow that I recommend you go check out.

At 11:00 o'clock, we have What's New in Metal, Part 1 and then a little later at 1:40, we have What's New in Metal, Part 2.

That'll tell us everything that's new in the world of Metal, awesome stuff you can add to your applications to make them better.

And then for you hardcore shader heads out there, we have Advanced Metal Shader Optimization at 3:00.

So if you want to know how to get the best out of your shaders, I recommend you go check out that talk.

It's really great.

Thanks for coming to hear us talk.

Welcome to WWDC.

Have a good rest of the week.

Thanks again.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US