OpenGL ES Overview for iPhone OS

Session 415 WWDC 2010

OpenGL ES provides access to the stunning graphics power of iPhone, iPad, and iPod touch. See how your application can create incredible visuals while maintaining high frame rates using the programmable pipeline enabled by OpenGL ES 2.0. Learn more about the innovations provided by iOS 4, and see how OpenGL ES can deliver interactive graphics in your games and other mobile 3D applications.

My name is Gokhan Avkarogullari. I'm with the iPhone GP software group at Apple.

And my colleague Richard Schreyer and I will be talking about OpenGL ES and the iPhone implementation of OpenGL, what's new on it and all the new things we added on iOS 4.

So last year at the WWDC we introduced OpenGLOpenGL ES 2.0 and iPhone 3GS.

Since then, we introduced third-generation iPhone, third-generation iPod Touch, iPad, and we're going to very soon release iPhone 4 and iOS 4.

With all this new hardware and software, there are a lot of new features added to the system.

Today we're going to give you an overview of the new extensions that we added.

We're going to talk about the retina display, the higher definition display, the higher resolution, and what it means from OpenGL's perspective, which is a pixel-based API.

We're going to talk about the impact of high resolution displays on OpenGL based applications.

And finally, we're going to talk about multitasking, and how you can make the user experience the greatest with the changes to your OpenGL API calls.

We're going to talk about new extensions first.

Multisample Framebuffer is an extension being implemented to help resolve aliasing issues in the rendering operations.

So let's look at what aliasing is.

Here is an example screenshot from Touch Fighter application.

If you draw your attention to the edges of the wings of the fighter plane, you can see that there's a staircase pattern, a jaggedness to it, on all edges on the wings of the plane.

If you do a close-up, you can see in the zoomed up version, there's a staircase pattern.

This is actually a significant reverse on the live application, because the pattern changes from frame to frame, and they look like busy lines.

So multisampling helps us deal with this problem by smoothing out the edges by rounding it to a higher resolution buffer and then generating results from that higher resolution buffer.

So how can we do it?

This is your regular way of creating framebuffer objects without multisampling.

You would have a framebuffer object that you would use to display the results of your rendering operation, and it would have, normally, a color attachment, a depth attachment, sometimes a stencil attachment.

Which, if you want to do multisampling in your application, you will need two of these framebuffer objects.

One to display the results of your rendering operation, and the other one to generate the to do the rendering into where you can get higher resolution images.

On the right is the one that you will use for displaying the results of your rendering operation.

It only has a single attachment in this case, a color attachment, and it's not filled in yet.

On the left is the multisample framebuffer object where the rendering operations initially take place.

You can see that it has a depth attachment and that takes place on this frame-buffer object, that's why the other one doesn't have a depth attachment.

And you can see that the buffers in the multisample framebuffer object are fill.

Basically it has a result of your rendering commands.

So with multisampling, once that is done, and the size of the buffers are different on the right and left.

So we're assuming here in this example, there's 4X multisampling, and therefore the buffer is attached to the multisample framebuffer object is four times the size of the buffer attached to the display framebuffer object.

So at the end of your rendering operation, you do resolve, which means that you take the average of the pixel of the color values of the samples for each pixel and generate a single color value for the pixel, and write it out to the framebuffer object that you're going to use to display your images on the screen.

So let's look at how you can do it in the code.

Here's a single sampled framebuffer creation. You have a color buffer, you generate it, bind it to your FBO, and then you get a storage packing storage for that from the CALayer, and now finally you attach it to the color attachment of your framebuffer object.

Similar operations takes place for the depth buffer.

You generate it, bind it, and get storage for it through the render buffer storage API call, and now finally attach it to the depth attachment of your framebuffer object in the single sample case.

Now, as I told you before, you have to create two framebuffer objects for the multisampling operation.

So let's look at how the framebuffer objects are created for the multisample framebuffer object.

The difference is mainly in how you allocate storage for your buffers.

In a multisample framebuffer case, the color buffer storage comes from the render buffer storage multisample Apple API.

Just like in the depth render buffer case.

The difference between this API and the previous one is you specify to OpenGL how many samples there will be in your buffer.

So in this case, in this example, it could be four.

Four samples to use.

And the buffers that are created will take into account that you are planning to use four samples per pixel, and they allocate buffers based on that information.

Let's look at how you would normally do a render in a single sample case.

You would bind your framebuffer object, set your report, issue your draw calls, and finally bind your color buffer that you want to display on the screen, and then present it on the display.

With the multisampling case, you have two framebuffer objects, and you do the rendering operations to one and display operations on the other one.

So to do the rendering operation on the multisample framebuffer object, you need to bind multisample framebuffer object first, and set your viewport to your regular draw operations, just like you did before.

But you need to get the data from the multisample framebuffer object to the single sample framebuffer object.

Now, for that, we need to set the multisample framebuffer object as a target of a resolve operation, as the retarget of the resolve operation, and the display framebuffer object as the draw target of that frame resolve operation, and finally you should resolve the multisample framebuffer Apple API call to get the contents of the multisample framebuffer object to the screen size on the display frame buffer object.

And just before, you would attach, you would bind the color buffer so you can display it on the screen.

So as you can see, the changes to your application to get multisampling behavior enabled is very simple.

You need to change your initialization code to generate a multisample framebuffer object, and you need to change your rendering just a little so that you can do a final resolve operation at the end of the rendering loop.

You might be thinking, "What kind of performance implications that would have on my application?"

And the performance implications are different, depending on what kind of GP you have on the product.

All iPhone OS devices starting with iPhone 3GS have PowerVR SGX GP on them.

And PowerVR SGX GP has native hardware restoration support for multisampling operation It understands the difference between what a sample is and what a pixel is, and therefore it grounds your shaders to generate color value per pixel, not per sample.

But it keeps depth values for each sample, and it also does depth testing for each sample, not per pixel.

So in the four-sample case, it would do four depth tests and would generate four depth values.

A single color value.

And depending on if the depth test passes or fails, it might have the same depth value and same color value for all four pixels, or it might have different depth and different color values for all samples.

It might have different depth and color values for all samples, depending on if the pixel is on the edge of a polygon or the inside of a polygon.

And after all these values are computed for each sample and your rendering operation is done, it takes the averages of the color values and that one would be your final color value after the resolve operation.

PowerVR MBX Lite is the GP that we use on all our products before iPhone 3GS.

Unfortunately, it doesn't have native support, native hardware restoration support, for multisampling.

Therefore, we implemented multisampling in that case, a super sampling.

Which means that we used high resolution buffers, and since the MBX Lite doesn't know the difference between the sample and pixel, it generates the same, it does processing for each pixel in these high resolution buffer.

So it generates color values for each sample, it generates depth samples, does the depth testing, for each sample, and finally that all these values result in a single value written into your result framebuffer object.

That means that there is more performance impact on the PowerVR MBX Lite GPs than there is on the PowerVR SGX GPs.

So you might be thinking to yourself, "I could already have done that, render the texture.

I could have generated an FPL that does have larger color attachments as textures and read them back and then sample from them and have the Multisample behavior on my application already.

So what is the difference?"

The difference is that you're giving us the information that your intention is to do multisampling.

So we can actually award the readback and then average operation at the end of your rendering textures multisampling.

When we generate the values, the high resolution values, we know that you're going to generate, you're going to use those values to create a single sampled version, so we write out both the high resolution and the lower resolution one.

And I'm going to talk about another extension that helps performance with the multisampling next, that's the discard.

With the discard, the difference between using multisampling and render-to-texture becomes even larger.

But first, let's give a small demo, what kind of quality impact does multisampling have on your application.

So, in this case, as you can see, there is a plane in the front, the edges of the wings are really busy, you can see the patterns changing from frame to frame.

And even if you look at the ones actually, the farthest away, it's even more visible over there.

And I turn on multisampling, it becomes significantly better than what it was before.

To have no effect whatsoever you have to have infinite resolution, but this one is significantly better than what we had before.

And if I go back and forth, it will be more visible to you, the change from non-multisampled case to the multisampled case.

As I said, multisampling is very easy to get into your application.

And you should try, experiment, see what kind of quality difference it makes and what kind of performance impact it has on your application.

We're going to talk about a second application, that is a Discard Framebuffer extension.

Discard Framebuffer, that helps the performance, fill rate performance with multisampling.

And even for it helps with the fill rate performance in the non-multisampled case.

Usually once a frame is rendered, the depth and stencil values that are associated with that frame are not needed for rendering of the next frame.

In a multisampled case, not only the depth and stencil but the color attachment of the multisampled framebuffer object is not needed to render the next frame.

So by using this extension, you can tell OpenGL that you don't need it, and we can discard those values without writing out to those buffers.

Which means that a significant amount of memory bandwidth can be preserved and not used for this operation, the writing-out operation.

If your application is fillrate-bound, is bound by the amount of memory activity, then this extension will help significantly to get better performance from your application.

Here's how it conceptually works.

Here's our framebuffer object, it's a single sampled example.

Our framebuffer has color and depth attachment.

Right now not filled in.

And the GP has done, has finished rendering and has generated the color and depth values.

Without using the Discard extension the next step would be the GP writing out those values to the color and depth attachments.

So what we don't really need for most of the cases is the depth values for the rendering of the next time, the next frame.

So if you use Discard, you can avoid writing out the depth value, and it will never leave the GPU.

And this way, the write-out section, the amount of memory used for writing out those values will be available to the application, to read more data into your rendering operations.

Let's look at the code example.

This is how we normally render in a single sample case when you're not using Discard.

You would have your framebuffer bound, and you would send your report, issue rendering commands, and finally present your color buffer on the screen.

But with Discard all you need to do is define what attachments you're going to discard, and through the DiscardFrameBufferEXT call, specify which of those attachments to discard.

And my memory's possible OpenGL will avoid writing out to those buffers.

You can imagine that in a multisample case, you've created buffers, they are four times larger than the original single sample case, so there's significantly more memory activity for writing out color and depth values of the multisample buffer.

So Discard makes a significant difference in terms of fillrate performance of your application, especially in the multisampled case.

But even in the non-multisampled case, you will still be avoiding writing to the depth and stencil buffers and it will help your fillrate in your application.

That was the Discard FramebufferFramebuffer.

We have another extension that might help you, that might help on the performance side of your application, and that's Vertex Array Objects.

If you have been to the morning session about OpenGL, the objects in general, GL objects, were discussed in detail, and Vertex Array Objects was one of them.

What Vertex Array Objects do is they encapsulate all the data for vertex arrays into a single object.

Things like where the offset for your pointer is, your vertex array for your position or normal or for text coordinates, what the size of each element is within those arrays, what the stride is, which arrays are enabled, if there's an index buffer or not.

They're all encapsulated into a single object.

So when you switch between different Vertex Array Objects, once you bind all the different information it becomes immediately available to the GL to take advantage of.

So how does this help you?

First of all, it provides a great convenience.

Once you log in your assets for an object, you can log the vertex away and encapsulate the entire state into a single object at the load time.

And if you're not dynamically updating your object ever after, it means that you're not going to issue these commands ever again, except for binding it and drawing from the Vertex Array Object.

It also allows us to do optimizations.

Since this state is validated only once, at the load time, we don't have to rebuild it.

We know, unless you update it, we know that nothing has changed, nothing has to be revalidated.

So it saves you CP time on state validation.

On top of that, the Vertex Array Object gives us very good information about the vertex layout, the layout of your data for your vertices for the arrays.

So we can make use of that to find out how to reorder them if necessary to get better performance out of the drawing operations.

Let's look at the code example.

This is when you are using a Vertex Buffer Object, how you would render it.

You would bind your video, and you would set your pointers for your position, for your normal, for your texture coordinates, and you would basically enable the non-default enables, such as the normal rate, enable the texture coordinate enables, and you will basically do you will do the draw operation by calling DrawArrays or drawElements calls.

And finally, you have to set the state back so it doesn't negatively impact the next rendering operation.

This has to be done at every draw call.

You have to specify these things when you're using VBOs at every draw call.

With VAOs, you only need to specify which VAO you're going to use, because all that information is already encapsulated in the VAO object.

So you know about them, you don't have to re-specify them.

And we use that and then draw our arrays based on that VAO.

This is possible because you've done the work to define the state only once, at one-time setup time.

You basically bind your VAO, you specify where the pointers and strides and all these things are, and you specify which states are enabled, and they are basically written out, captured in one place at one time, and can be re-used over and over again for the subsequent drawing calls.

That was Vertex Array Object.

As you can see, it's very easy to take advantage of, and it gives you performance boost, it will hopefully make your code less error-prone because there will be fewer lines of code, and it will make your code more readable, because there are also fewer lines of changes in your code.

There are six more extensions I'd like to talk about.

I'll give information on each one and how you can use them in your applications.

The first one I'd like to talk about is the APPLE_texture_max_level extension.

This extension allows the application to specify the maximum (coarsest) mipmap level that may be selected for texturing operation.

So it helps us to control the filtering across atlas boundaries.

As texture atlases contain textures for multiple different objects, as mipmap levels get smaller, there is some atlas boundary filtering operation that takes place that generates visual artifacts So this extension is implemented to solve that problem.

You can enable this extension by doing a text parameter call with the GL_TEXTURE_MAX_LEVEL_APPLE call, and you can specify up to which mipmap level you want to use for the texturing operation.

Let's visualize this.

This is a texture atlas from the Quest game.

You can see that there are textures in one, single texture object, textures for walls, for stairs, for statues, and all of them are here in one single texture atlas.

If you look at the mipmap levels and I have from 256 by 256 to 1 by 1 in this picture, this is the visualization of the mipmap levels in terms of pixels.

But let's look at the mipmap levels in terms of the coordinates.

If you were to use 0 to 1 coordinates for the entire texture atlas, the 0 to 1 coordinate on this mipmap levels will look like this.

At the very top level, it will be exactly correct and perfect.

It will have all the necessary pixels in it.

This is a 256 by 256.

But at the lowest level, 1 by 1 pixel, 0 to 1 coordinate scan is only one pixel, and therefore it has only one color.

So if you can imagine that you have the stairs, the walls and the statues, close up, they will use the first one, the 256 by 256 one, but farther away, they will use this one, or something small, something closer in mipmap level to this one, and they will all look the same color.

So texture_max_level extension avoids this problem by allowing you to specify the mipmap levels you care about and only use those.

So in this example, you can say the max level is 3, and the texturing hardware will only use the first, second and third level of mipmap levels for texturing from the particular texture object.

Let's look at another extension that modifies the texturing behavior, that is Apple_shader_texture level of detail extension.

It gives you explicit control over setting the level of detail for your texturing operations.

So if you'd like to have an object look sharper, then you can specify a mipmap level that is finer than the hardware would choose.

You can specify a lower level mipmap with higher precision.

Or if you want to have better fluid performance at the expense of having fuzzy detection applied to your objects, you can choose a lower mipmap level, a coarser mipmap level, through this extension.

So all you need to do is in your shader, enable the extension and control the mipmap level through the texture_lod.

There are further APIs in this extension for controlling the gradients and such.

I'd like you to go through and read the extension to find out what more you can do with this extension.

And this is a shader-based extension, it's only available on PowerVR SGX based devices that have PowerVR SGX GPUs on them. Okay.

So we also added the depth texture extension to our system, so that you can capture the depth information into the texture and use it for things like shadow mapping or depth of field effect.

In shadow mapping, you would render your scene from the perspective of light, and then you will capture the depth information from the perspective of light, into a texture.

You will do it for every light.

And then once you're rendering from the perspective of the camera, you can basically calculate if that particular pixel sees any of the lights, and if it sees all of the light, it will be eliminated by all of them.

If it sees some of them, it will be eliminated by some of them.

If it sees none of them, it will be entirely in a shadow.

So you can use that texture extension to do shadow mapping.

But since you're rendering the scene multiple times, there's a lot of fillrate this consumes, so you need to be careful about performance implications of using this technique.

Another example of using that textures is depth of field effect.

I will show a small demo of that, so I will talk to that in the demo.

One thing I need to remind you that when you are using a texture as a depth attachment to your framebuffer objects to capture the depth information of the scene, you can only use the NEAREST method.

The reason for that is that doing filtering across different depth values, just you know, it's incorrect values.

Something close to the river and something far away in the river, when they're filtered, the depth values filter generates something in between, though there's no object in between these two depth values.

Okay. So how can you get that texture extension into your application.

You just generate your texture as usual, and when you create your framebuffer object, you attach this texture to the depth attachment of your framebuffer object.

And in subsequent rendering operations, the depth information will be captured in this texture, and then you can use it for texturing later to do whatever post-processing effects or whatever effects you want to do.

And let's look at the demo, how we can generate a depth of field effect with the depth_texture extension.

So here again, we are using three planes, and I'd like to point out that this is how you would normally render it without the depth of field effect.

Everything is sharp, in focus, the plane in the back, the stars, the plane in the front.

So here's the depth, visualization of depth information.

This is the depth texture captured by rendering to texture and then displaying the texture.

So things that are black are closer to the screen, things that are white are farther away .And we re-render the same thing in a blurred version, at a low resolution and the blurring introduced.

So human eye, when it focus onto something, there's a range of objects that are in that range that are sharp, but the things that are closer or things that are farther away from that focus point on the range are blurrier.

So this texture, this will capture that blurrier part, and the original scene will capture the sharper part.

You can see that they are quite different.

So the operation between the two is basically generating a mixture of these two.

So here, I'm visualizing which texture I'm going to use for my finally rendering depth of field effect.

If the focus is by the near plane, and this entire range, it will be black, so it's entirely from the darker, from the sharper image.

And as I get the range closer, smaller, it will basically start using the values from the blurred image.

So that when I have the range at 0, it means that everything is blurred, nothing is in focus, and everything will be used from the blurred image.

So let's look at it visually.

So if I have full range, I end up with the original scene.

Original rendering without the depth of field effect.

Then if I set the range to shorter, you can see that the stars have become blurrier, and the two planes in the back are blurry.

Now the entire thing is blurry.

I'll move the range out and it gets things into focus and it becomes sharper.

Or I can use a short range and move my focal point away and the things in the front will become blurrier and the stars and the third plane in the back will become sharper.

As you can see, the things the planes in the front are blurrier.

But if I move my range to capture the entire near and far planes, again everything is sharper, because we have enough range everywhere to have focal point covering everything.

So that's one of the examples of how you can use depth_texture extension.

Another shadowing technique that's very popular is stencil shadow volume.

This extension, the stencil_wrap extension, helps us to improve performance for that.

With this extension, the value of the stencil buffer will wrap around, then the array goes in and out of the shadow volume.

Now, stencil shadow volumes is a large topic, and it's a very nice way of creating real-time shadows.

We're spending a significant amount of time in the next session over here in the shaders OpenGLS shading and advanced rendering session, on how to generate them, how to use them.

There's really cool demos on visualization of this technique and implementation of it.

We added two new data types for texturing operations, float textures, you can specify your textures to contain 16 bit or 32 bit floats.

So again, this requires hardware support that is only available on devices that have PowerVR SGX GPUs.

The float values that you can store in your textures can be used for visualizing high dynamic range images, for you can use it for tone mapping and display high dynamic range images in all its glory on iOS 4 based devices.

It also can be used for general-purpose GP operations, for the GPU math.

You can load your data in the float textures and do FFTs and other signal processing or other applications, whatever math you want to use with this extension.

And the way you specify a texture to be float values is basically telling to OpenGL that its format is GL half float OES or GL float OES.

This is the last extension I'm going to talk about today.

It's the APPLE_rgb_422 extension.

And this extension enables getting video textures onto GPU.

Specifically, interleaved 422 YCbCr type of video.

In this extension, we do not specify the color format of the video, so it could be based on, it could be captured from a 601, standard definition video format, or it could be coming from HD709 based color format, or it could be JPEG full-range video.

And, so therefore, since we do not specify it, we give you the freedom and flexibility to implement your color space convergent to your in your shader.

With this extension, when you specify YCbYCr, you copy the values over from Y to the G channel, copy the values from CR to R channel, and Cb values to the B channel, and once you do your color space conversion, you end up with the RGB values that are coming from originally a video texture.

Then again this extension relies on hardware support, so it's only available on the devices that have PowerVR SGX.

So, let's look at how you can use this extension.

This is how you would do texturing operations, something from a texture in non-RGB 422 case, you will specify your type of image and then you will create a sample and texture and sample from that.

With the 422 extension, you need to specify the format of your texture as RGB_422_APPLE, and you need to specify the type of it as unsigned short, either 8 and 8 rev or forward or reverse ordering of Cb and Ys, and finally your shader, you need to convert from YCbCr values to the RGB values, and then you can do whatever effect you want to do, do black and white or just attach your texture to another object that you want to do you want to use it on.

So these are these other six extensions.

The texture_max_level, the shaded texture level of detail, depth texturing, stencil wrap, flow texturing and RGB 422, and with that, I'd like to invite Richard to talk about retina display, the impact of high-resolution displays on performance, and multitasking.

Thank you.

So thank you.

So, Gokhan has just given us a description of all the new features you'll find within OpenGL in iOS 4, so I'm going to continue the what's new topic, but really going to focus on what's new in the rest of the platform around you that impacts you as an OpenGL application developer.

First and foremost among these is the new Retina Display you'll find on iPhone 4.

So, you've undoubtedly seen the demo.

The Retina Display gives, is a 640x960 pixel display.

That's in effect four times larger than we've seen on any previous iPhone.

One of the really big points I want to drive home about the Retina display is that we're not cramming a whole bunch of content into the upper left-hand corner.

All of the various views and other widgets remain physically exactly the same size on the display.

The status bar, the URL bar, are all exactly the same size.

What's changed is the amount of detail you find within any specific view.

This is equivalently true to the UIKit content as it is to OpenGL content.

So how do you actually adopt really make the best use of the Retina display?

For OpenGL applications, it requires a little bit of adoption.

It's not something you get out of the box.

The steps are pretty simple.

Right off the bat, we want to render more pixels.

We need to allocate a larger image.

The second step is, now that you're rendering to a different size image, we've found that a large number of applications have, for their own convenience, hard-coded various pixel dimensions in their applications.

That's something that we'll need to flush out.

And finally, this is where it really gets interesting, taking advantage of the new display to load great new artwork.

So, step 1.

Generating high resolution content is actually done on a view by view basis, and this is controlled with a new UIKit API called Content Scale Factor.

So, you can figure your view with the same bounds that you would always have, in a 320 by 480 coordinate space.

What changes is that you set the content scale factor to, say, 1 or 2, and that will in turn affect the number of pixels that are allocated to back the image behind that view.

For UIKit content, this is generally set on your behalf to whatever is appropriate for the current device.

Right out of the box, all of your buttons and your text fields are going to be as sharp as they possibly can be.

But that is not true for OpenGL views.

For OpenGL views, the default value for content scale factor remains at 1, and you have to explicitly opt in by setting that otherwise.

Usually, the straightforward thing to do is to query the scale of the screen that you're running on, and then set that to the content scale factor .On an iPhone 3GS, this will be 1, and nothing changes.

On an iPhone 4, this will be 2, and you're effectively doubling the width and height of your render buffer.

At the time you call render buffer storage, core animation is going to snapshot both the bounds of your view and the scale factor, and it will do that to arrive at the actual width of the image you'll be rendering into.

Knowing what that width and height is is usually pretty convenient to have, so you can derive that by doing your own bounds times scale, or , even easier and more foolproof, is to just go ahead and ask OpenGL what the allocated width and height are.

So, I just want to this is actually a pretty good idea to just ask OpenGL and take these two values and stash them away somewhere on the side.

They're really useful to have.

That brings us to step 2, and that is, fixing any place where you have any hard-coded dimensions that may no longer be valid.

If you've if your application is already universal, it runs on both iPhone and iPad, you've probably already done this, and you can move on.

If you haven't done that, you may find that you have a few of these cases, and I want to point out a couple of the most common cases you'll find in your application.

First is that while core animation has chosen the size of your color buffer, the depth buffer is something that you allocate, and the sizes of those two resources has to match.

And so, and this is a case where we'll want to use that saved pixel width and pixel height, and pass it right on through to render buffer storage.

If you don't do this, you'll find yourself with an incomplete framebuffer and no drawing will happen.

Another common case is GL Viewport.

Viewport is a function which chooses which subregion of the view you're rendering into at any given point in time, and every single application has to set it at least once.

You'll find it somewhere in your source code.

Most applications really don't ever use anything other than a full screen viewport, so this is another case where you'll just want to pass pixel width and pixel height right on through.

Step three is actually where it gets really interesting.

At this point, your application is a basic correctly adopter of the Retina display.

You've now got much greater detail on your polygon edges, but there's still more room to improve things and really take advantage of this display.

And so, you know, this is the right place to, for example, load higher resolution textures and other artwork.

Again, if your application is universal, you may already have a library of assets that are perfectly relevant, that you can use right away on this.

Usually, the easiest way to do this is take your existing bitmap textures and just add a new base level, and leave all the existing artwork in place.

This can really significantly improve the visual quality of your application.

Just one word of caution here, is that you can do this on any iPhone OS device, but it's going to be a waste on the devices that don't have large displays, and so you actually really want to be selective about which devices you choose to load the largest level of detail on.

Otherwise you're just burning memory.

One other word of warning is using UIImage to load textures.

UIImage has a size property which refers to dimensions, the dimensions of that image, but those dimensions are measured in device-independent points, not pixels.

So if you have a higher resolution image that's 256 by 256 pixels, the size might only be, the size in points might only be 128 by 128, so you can't just take those values and pipe them into GL Tech Image 2D.

So, this is another one of those cases where you'll have to do your own size by scale, or you can just drop down a level to CGImageGetWidth, GetHeight, which will give you the image dimensions straight out in pixels.

If you get caught up by this, you'll probably see some really really strange effects.

That's really about all there is to say about making the most of the Retina display.

If tomorrow there's going to be a session that talks about the UIKit changes in detail, which is where you'll hear all about how UIKits measurements in points, where OpenGL is a pixel-based API.

So UIKit can do almost everything for you with no application changes, where you do need some changes for OpenGL.

But really ,when you get right down to it, the one line of code change that really matters is setting content scale factor.

There's one more really interesting topic about the Retina display, is that you're drawing four times as many pixels.

That can have some pretty significant performance implications.

This is equivalently true if your application runs on iPad, or even if your application uses an external display.

TVs can be quite large as well.

So I want to talk a little bit about this too.

So really, the first thing to do here is to roll up your sleeves and start working through he standard set of fillrate optimizations and investigations.

You have to think about how many pixels is your application drawing, in this case, X and Y got a lot bigger.

You also have a lot of control over how expensive each pixel is.

Properties of mipmaps can significantly improve GPU efficiency.

You are in direct control over the complexity of your fragment shaders, operations like Alpha Test and Discard, also the costs of those add up pretty quickly with screen size as well.

I'm really going to stop here, and really not get into the details of performance optimization, because that's a gigantic subject, and we're going spend a whole session on that this afternoon, in OpenGL ES Tuning and Optimization.

That being said, in our experience, there are a lot of interesting applications that do have room for performance applications, and do end up being satisfied with the performance they get on these devices, even when running on higher resolutions, both iPhone 4 and iPad.

But that's not universally true.

There are some developers are really aggressive.

They're already using everything these devices have to offer.

And for these particularly complex applications, you may find that you've used up all the you've optimized everything there is to optimize.

And so there's one more big tool in our toolbox, and we're actually going to go back to how many pixels are you actually drawing.

So you don't necessarily need to render at the size that matches the display.

For example, on iPhone 4, has a 640 by 960 screen.

If you could instead render 720 by 480, that's still a significant step up in quality when compared to a 3GS, and on the other hand, you're only filling half as many pixels as you would be had you gone all the way up to match the display.

You're going to find very few other opportunities out there to find a 2x performance jump in a single line of code.

So if this becomes an option that you want to pursue, how do you do this?

You could just throw some black bars around the sides, but not really.

What you really want to do is we want to scale that, take that lower resolution image and actually scale it to fill the whole display.

Okay? How do you do that?

Well, you actually don't need to do that at all.

This is something that Core Animation will do for you.

In fact, this is something that Core Animation will do for you really well and really efficiently.

Much more so than you could do yourself.

In this case, you know, actually, move on to the next slide.

That is, in the end, a really nice tradeoff between performance and visual quality.

So actually, how do you really make use of this?

Well, the answer is that you literally have to do nothing.

This is how your applications already work right out of the box.

The API that controls this is, again, Content Scale Factor.

As I said, for compatibility reasons, today your applications render into a view with a Content Scale Factor of 1, which means that Core Animation is already taking your content and scaling it to fit the display, very efficiently.

So right out of the gate, your application already performs as well as it always has, and looks as good as it always has.

That's a pretty decent place to start, if you can change your application such that you can run at the native resolution of these devices, well, that's pretty good too.

But we see that there's going to be a fairly large class of applications that do have performance headroom to, say, step it up a little bit, but you can't take a four times jump in the number of pixels.

And so there's some really interesting middle grounds to think about here.

One of these is to stick with a scale factor of 1, but adopt anti-aliasing.

We just saw Gokhan discuss our edition of the Apple Framebuffer Multisampling Extension.

This can also do a really good job of smoothing polygon edges.

In our experience, many applications that adopt multisampling end up looking almost as good as if you were running at native resolution, but the performance impact can be much, much less severe than increasing the number of pixels four times.

So this is a very compelling tradeoff to think about.

Another interesting option is that you don't necessarily have to pick integer content scale factor.

You can pick something between 1 and 2.

In the example I started from, 720 by 480, to get that effect, you can set a content scale factor of 1.5, and that's actually all you really have to do.

I want to half change the subject now.

I want to talk about iPad.

iPad has an even larger display, and so the motives for wanting to do application optimization are just as true, and for some applications, the motives for wanting to render to a smaller render buffer are just as true.

There's just one unfortunate catch, and that is that our very convenient new API is new to iOS 4.

It's just not there for you to use in iPhone OS 3.2.

Fortunately, there is another way, and that is using the UIView Transform property, which has been there since iPhone SDK first shipped.

So, I'm going to put up a snippet of sample code here.

So think back to the beginning of the presentation, when I said that the size of your render buffer was your bounds times your scale.

On iPad, the scale is implicitly 1.

There's no API, so we'll call it implicitly 1.

In which case, if we want to render an 800 by 600 image on iPad, you can set the bounds to 800 by 600, and then you can set on the Transform property, you can set a scaling transform that will take that and scale it up to fill the display.

Performance-wise, these two methods are actually, both performance and quality-wise, these two methods are pretty much equivalent.

The advantage of this method is that you can start using it on iPhone OS 3.2, whereas ContentScaleFactor for iPhone iOS 4 and later is just more convenient.

That's what I have to say about large display performance.

We're going to talk a lot about this kind of performance investigation in detail later this afternoon in the Tuning and Optimization session, and if you just run out of steam there, then you've got some really fine-grained control over what resolution you actually render at, which can significantly reduce the number of pixels you have to fill.

That brings us to the last topic of the day, and that's multitasking.

So, I want to start this by providing an example.

Say your product is a game, and the user is playing, and they receive a text message.

They leave your game to go write a response.

Sure, dinner sounds great.

They come right back to your game.

They probably spent 10 seconds outside of it.

So what do they see when they return?

They see that they're going to get to wait.

That's not a very good user experience.

In fact, they end up waiting for longer than they spent outside of your application in the first place, that's really not a good user experience.

And so, this is what we're talking about when we talk about fast app switching.

If you went to the session, you heard that there are a bunch of other scenarios.

There's voice over IP, location tracking, let's see, location tracking, there's finish tasks, there's audio tasks.

All of those are about various modes of doing work while in the background.

Fast App Switching is different.

The Fast App Switching scenario is an application that does absolutely nothing in the background.

It's completely silent, completely idle.

It's there simply to lie in wait, so that it can leap back into action the instant that the user relaunches that app.

And in fact, for OpenGL, that is the only mode.

GPU access is exclusively for the foreground application to ensure responsiveness.

So while you can use some of these other scenarios, you can create a finish task to do CPU processing in the background, you do not have access to the GP one in the background.

There's one really important point I want to make, is that if you think back to that progress bar, a lot of what that application was probably doing was loading things like OpenGL textures.

That tends to be pretty time consuming.

So one thing you don't have to do is de-allocate all of your OpenGL resources.

You can leave all of your textures and all of your buffer objects and everything else in place.

You just have to go hands off and not touch them for awhile, but they can stay there.

This means that when a user does bring your application back to the foreground, all o those really expensive to load resources are already there, ready to go right away.

Keeping all that stuff in the background does have some implications on memory usage, and that leads to a really interesting tradeoff that you should think about carefully.

It is generally a really good idea to reduce your application's memory usage when you run in the background.

For example, the system memory as a whole is a shared resource.

If the application in the foreground needs more memory, the system will go find applications in the background to terminate to make room for it.

That list is ordered by who's using the most memory.

Guess who's going to be on top?

So, you have a really compelling, even perfectly selfish reason, to want to reduce your memory use as much as possible.

Because that means that your process is probably going to be more likely to be there to come back to in the first place.

On the other hand, if you're making the resume operation slow by spending a whole bunch of time loading resources, we've kind of defeated the purpose.

There's really a balancing act to be made here.

So the way you should think about it is look at your application on a resource-by-resource basis, and think about what you really need to pick up right where the user left off.

And also think about how expensive is this to re-create.

If you've got your standard textures, those are probably pretty expensive to re-create.

You want to keep those around.

On the other hand, there are some that are really cheap to re-create.

Think about your color and depth buffers.

There's no drawing in the background, so they're really just sitting there, not doing anything.

Re-creating them is not like a texture, where you have to go to the file system and load data and decompress it and so on.

Reallocating your color and depth buffers is just conjuring up empty memory.

It's really fast.

Also think about cases where you actually have idle resources that aren't actually needed for the current scene.

You know, if you've got a bunch of GL textures that are around because you needed them in the past, and you're keeping them around pre-emptively because you might need them in the future, this is a really good time to clear out all the idle textures in that cache and leave all the active ones in place.

A little bit more about the mechanics of it.

How do you actually enter the background and come back?

Your application is going to receive a data enter background notification.

When this happens, we have to stop our usage of the GPU.

Specifically, your access to the GPU ends as soon as this notification returns.

So you have to be done before you return from this function.

The second is that you want to save application state.

In this case, if you're writing a painting application with OpenGL, that might involve using read pixels to pull back what the user has painted and save it to the file system.

And this is really important, because your application may be terminated in the background to make memory, to free up memory.

And then finally, here's our example of releasing memory for this application.

You know, we say we're going to go release our framebuffers, because we can re-create them really fast, without slowing the user down.

On the other side, when your application wants to enter the foreground, you'll receive an applicationWillEnterForeground notification.

We'll spend a tiny fraction of a second allocating our framebuffer, and then we're ready to go.

This is exactly where we want to be.

So that's great.

Except that there's one other case here that you really have to think about very carefully.

And this is and this is, you might receive applicationDidFinishLaunching instead.

If your process was terminated while in the background, then you have lost all of your GL resources, as well as everything else.

And you now have to reload them, and that's going to take time.

The more interesting part here is restoring that saved state, restoring that state that we saved when entering the background.

Because when the user enters the user doesn't know when an application is terminated in the background to free up memory.

It's a completely invisible implementation detail to them.

So to you, it's effectively unpredictable which one of these paths your application will take, to reenter the foreground.

And so, if in one of these cases you put them right back in their game, and everything's golden, whereas in the other case, you make them page through a parade of logos and select a menu and click Load Game, the user is effectively going to see random behavior here.

They won't know which case to expect when they press on your icon, and that's fairly disconcerting.

And so it's really critical here that regardless of which path your application takes, you put them back in exactly the same place, say, reloading their game.

Ideally, the user just can't tell the difference between these two cases.

Practically speaking, there will be a performance difference.

The whole point of Fast App Switching is to keep those resources around, and if they're not around, you're going to have to spend time to load them.

But for the best-behaved applications, the application's performance will be the only difference in behavior the user can see.

That's Fast App Switching.

You want to free up as much memory as possible, but not to the extent that you're going to slow down Resume.

And then you also need to think about doing a really good job of saving and restoring your GL state.

Which could include actually the contents of your OpenGL resources, if you're modifying those on the fly.

So that actually brings us to the end of today's presentation.

To just give you a quick recap of where we've been, we've talked about some new OpenGL extensions to improve the visual quality of your application: multisample, float texture and depth texture.

We have new features to improve the performance of your application: Vertex Array Objects and Discard Framebuffer.

Discard and Multisample go together particularly well.

We talked about how to adopt the Retina display.

Ideally, that's one line of code.

We talked about resolution selection for large displays.

This is really your big hammer to solve fillrate issues if you have no other option.

And finally we talked about multitasking, where the key is always think about the phrase Fast App Switching.

We have a number of related sessions.

Coming up later this afternoon is OpenGL ES Tuning and Optimization, where we'll go through the process of how to actually look at fillrate performance in your application, as well as introduce a new developer tool that can really help you understand what your application is really doing.

Shading and Advanced Rendering is more of an applied session, where we're going to go through some really classic graphics algorithms and then talk about how we practically applied them to the graphics in Quest.

OpenGL Essential Design Practices happened earlier today, and this is a really great talk which goes into the subject of general OpenGL design practices, where you want to use this kind of object, learning about modern API changes.

This kind of stuff is equally applicable to both desktop and embedded OpenGL.

And then finally, there's a couple sessions that talk about multitasking and the Retina display, as they apply to the whole platform, not just OpenGL.

You can contact Alan Schaffer directly, he's our Game and Graphics Technologies Evangelist, and we also have a great collection of written documentation in the OpenGL ES programming guide for iPhone.

So, with that, I hope this talk was useful to you today, and I hope to see you at the labs.

Thank you.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US