Advanced Editing with AV Foundation

Session 612 WWDC 2013

AV Foundation provides powerful services for editing video and audio in your iOS or OS X apps. See the tremendous control and flexibility AV Foundation offers. Learn how custom compositors create new possibilities for advanced transitions and effects. Gain expert insight into best practices for integrating compositions into your app and working with audio mixes.

[ Silence ]

Hello and welcome to the advanced editing with AV foundation session.

My name's Scott Johnston on the AV Foundation Team.

[ Applause ]

So today we have two main topics for you.

Introducing custom video compositing, and secondly, a really useful debugging technique for debugging compositions.

So let's start with custom video compositing.

First of all, we're going to take a few moments just to review the existing AV Foundation Architecture.

AV Foundation has been available for a few years now.

Ever since iOS 4 and Lion.

It's used in a wide variety of video editing applications in the app store including those from Apple.

It provides core and key fundamental tools to create cuts only temporal compositions, video mixing compositions, and of course audio mixing.

In Mountain Lion and iOS 6, it's quite possible to create a wide variety of visual effects.

Dissolves, scaling, transformation effects, layering, picture in picture effects, wipes, and slides.

And now, new for Mavericks and iOS 7, we'd like to give you guys the opportunity to place your code inside the video composition render pipeline.

So giving you direct access to the pixels with the CPU and the GPU to colorize, warp, texture map.

Achieve a much wider variety of complex effects.

And we'll see more of the sea otters later [laughter].

So custom video compositor.

But what is a video compositor?

Well, it's basically a chunk of video mixing code.

Which takes zero or more source frames blending or modifying the pixels to deliver a single output frame.

We'll find the existing built-in compositor inside the composition architecture.

So looking inside compositions as a whole, we first see the AVComposition, which is our temporal cuts-only edit.

We place time ranges of source assets in tracks.

Here we have two tracks, A and B, and in this example we're going to be dissolving from track A to B and back to A again.

In order to achieve the dissolve, we have the AVVideoComposition, which stores the video mixing parameters and time ranges.

So let's take a closer look at the video composition itself.

Now for these time ranges in the video composition, they're instruction objects, which store the mixing parameters.

But some instructions are simpler than others.

For example, one source track.

Another can take more than one.

And looking at the first simple segment there, that first instruction.

It's effectively just passing through frames from track A through the compositor rendering the first segment.

Now the more interesting segment is where the initial dissolve happens, where we take two frames, and using the parameters from the instruction to tell the compositor to blend them.

And with them, we render the rest of the edit.

Focusing even more on the compositor, we have multiple source frames and a single frame out.

Along with the source frames, we have the instruction object with the mixing parameters.

In this case, a dissolve which we encode as a property in the instruction.

In this case, in ramp from one down to zero.

At this point, we're at the compositor, so let's introduce the new custom compositing API.

The built-in compositor is green, you can now replace that and do your own stuff in Mavericks and iOS 7.

In addition, you can implement your own instruction to gather up and carry your own set of mixing parameters.

It's bundled up with the source frames in the request object.

So you'll be implementing the new protocol AVVideoCompositing, which receives the new object, AVAsynchronousVideoComposition request.

And, in addition, you'll be implementing the AVVideoCompositionInstruction.

Now looking at the methods involved here, you'll be receiving your request when you implement startVideo CompositionRequest.

And then deliver the frame once you've rendered with finishWith ComposedVideoFrame.

Now in the event of an error, you can call finishWithError or acknowledge a cancelled request.

Okay, so what about the format of the source frames?

How are the pixels arranged?

Well you've probably looked in our header files and seen a wide variety of formats.

You'll probably only see a small subset here, but generally when we're decoding H.264 video, you'll receive YUV 8-bit 420.

But that could vary and your code may not be able to render into YUV.

So you can express the exact format that you require by implementing source pixel buffer attributes.

Now here you can provide an array of formats that you would like to receive.

In this case, our example requires 32-bit BGRA, and by doing so the system will, on the fly, convert the source frames into that format.

Similarly for the output pixel format, you implement requiredPixelBuffer AttributesForRenderContext.

And we're requiring 32-bit BGRA here.

Now in order to get hold of a new empty frame to render into, we go back to the request object, ask it for the render context, which contains information about the aspect ratio and the size that we're rendering to, and also the required pixel format.

So we ask it for a new pixel buffer.

Now this comes from a managed pool.

So the frame is allocated, we can then render in to their, in this case our dissolve and send the output on its way.

Now let's jump into Xcode and we'll have a demo.

Okay, so, in Xcode, here we are with the Hello World equivalent of a custom compositor.

So, here we're implementing our AVVideoCompositing protocol and those three methods that we called out in the slides.

So again, we're requiring BGRA and it's simply a case of putting in that value.

Now remember it's an array, so we can, if we do support YUV in some flavor, we can drop it in there.

Similarly for the output, in this case it's identical.

Now the action happens inside the rendering function startVideo CompositionRequest receiving our request object.

Let's open this guy.

Okay, so, the first thing we do here is - it's slightly hard coded just for simplicity - but we fish out the source frames from the passed in request object.

Now you'll get these as Core Video pixel buffers, and we'll try our hardest to make sure they're IOSurface backed, so they're going to be on the GPU.

The next thing we do is we allocate a fresh, new pixel buffer to render into so, as we saw in the slides, we ask the request object for the render context and allocate a new pixel buffer.

Then we do the rendering and deliver the output frame.

So before we look at what's inside that little render block, let's run up and see what this does.

Okay here we are with a deer again.

Okay so this app, a simple app, and it's really simple because we've managed to use the new AVPlayerView from AVKit, which gives you a nice, simple player view with a scrubber and various other goodies.

So let's play and we'll see what happens.

Okay so we've got a really simple wipe, with that white line coming down.

Let's play it again if you missed it.

Okay so there's nothing particularly exciting, but the main thing is this will illustrate how we can get hold of those pixels, in this case which is CPU, so the pixel buffers you can lock to the base address.

At which point, you can then ask for the usual things like the bytes-per-row and the base address, and at which point we can then start splatting the pixels.

Now, you'll see this rather ugly - and don't do this at home - memcpy and memset stuff.

Just purely for simplicity here, but I'll close that.

Now, up here, when we play that transition, obviously we're doing some sort of animation and on to calculate the vertical position of the white band.

We have to basically interpolate across our time range to work out exactly at what point we are, we use a thing called tweening here, we calculate this tween factor.

Now we're going to come back to this in the slides in a moment, but basically there's some basic arithmetic going on in here and it's all to do with the time range associated with the current video composition instruction.

We'll come back to that in a sec.

But this isn't a particularly exciting example of a compositor, so let's have a look at something using the GPU, using OpenGL.

So let's run this guy again.

And this time we're going to change the composition, and I've got some pre-prepared ones fortunately, so, let's switch to this guy.

Okay, so, let's play it this time.

Coming in quite fast, and there's our otters again.

I'll play that again.

It goes through quite quickly, but we can do something about that, we can change the [pause] video mixing parameters that we've placed in our composition, and I can reveal some controls wired directly to properties in our instruction, in our custom instruction here.

In this case, stiffening and damping.

Now these are going to influence the way we're calculating the physics.

In this case we're mapping, texture mapping the video onto a mash.

And then applying some physics to the vertices.

So let's see if we can slow this down a little bit, so, [pause] let's play that through again.

Okay, it's a little bit different.

You can see the, as it comes in, it's a strange, hypnotic effect.

But this we can play with it even more by playing with the damping.

Woops. Okay.

Of course that was all planned.

Okay so, we've set up the stiffness and damping to be a little bit wilder this time.

Yeah. [Pause] Okay, so anyway, that's an example of the OpenGL doing something useful.

Let's switch back to the slides.

Okay we mentioned tweening there, which is an interpolation.

So let's have a quick overview of that in case you're not familiar with how that works.

Now we'll focus in on the second segment, which does the dissolve in this example.

Okay our instruction contains our opacity ramp from 100 percent down to 0 percent.

Of course the interesting properties in there, as well as the opacity ramp is the time range and our first example is arbitrarily starting at 5 seconds with a duration of ten-thirtieths of a second.

Now in the first frame we've only elapsed a few seconds in.

At which point, the passed capacity should be 100 percent.

Now we calculate this tween factor by dividing the elapsed time by the duration.

In which case we calculate zero, which leads right to the beginning of the opacity ramp.

Now things get a bit more interesting when we move forward one frame.

We've elapsed one-thirtieth of a second in.

The tween factor is now calculated to be 0.1, which means we're a tenth of the way through the opacity ramp.

By multiplying the opacity by the tween factor, we can then arrive at the desired opacity for this kind of point in time with 90 percent.

We do the same calculation for each frame all the way through to the end.

Arriving with a tween factor of 1, which means we'll get the exact destination opacity of 0 percent.

Okay, so there's some performance considerations.

[Pause] But before we get to the performance, there's a caveat.

When your app on iOS is backgrounded, OpenGL won't be available.

So you'll have to have a CPU equivalent.

Of course you can use the CPU in foreground too.

So, the three booleans inside the new composition instruction.

It's important to look at these because there are performance wins to be had.

First up, passthroughTrackID.

So some segments are, sorry, some instructions are simpler than others.

Some just take one source and don't modify the frame at all, they simply pass through frames from track A.

In this example, this gets you to the first segment.

Simply passing through frames on track A.

[Pause] If we're not modifying the frame, then we might as well bypass the compositor completely.

Now this is a power win.

Next up is requiredSourceTrackIDs.

So let's say that first instruction is actually modifying those single source frames coming through.

This say it's doing some colorization effect.

In which case we do require frames from track A and we do want the compositor to be called.

So we set required source track IDs to be track A, that single track.

Now what we don't want to do is just leave it to be the default value of nil, because that's equivalent, that says, deliver me frames from all the tracks, and it's going to be wasted effort.

So containsTweening.

This is about our animation again.

Now consider this composition.

We have a moving picture-in-picture effect.

But all the source frames coming at us are the same.

Perhaps we've time extended the source so we got a run of identical frames coming through the compositor, but we're still doing animations so every output frame is going to be unique.

Because of that we set, containsTweening to be yes.

[Pause] But if we leave containsTweening as yes, and then we stop doing animation for some reason, so we have a static picture-in-picture effect.

The picture-in-picture is at the top right for every frame.

The same source frames every time.

We're just re-rendering identical output.

So by being careful to set contains to be no when you have no animation.

Then after the initial frame, the system is at liberty to optimize away those renders and reuse the identical output.

And then we look at the pixel buffer formats.

There's another performance win to be had here.

I think, as I said maybe earlier, if we can work in a wider variety of formats, in particular YUV 420 then do so, because then that will avoid, generally avoid a conversion hit on this source frames coming in.

With output it's less critical since the display, for example, can accept multiple formats, like BGRA and YUV.

There will be no conversion.

Okay, so we have some sample code, AVCustomEdit.

In there you'll find some OpenGL examples.

Okay, the second topic is debugging compositions.

Debugging compositions, well, it's quite possible to construct elaborate and complex compositions with entirely legal values while producing unexpected results.

Sometimes that's just a black frame.

I like to think of this a bit like building an OpenGL scene, but inadvertently pointing the camera view vector in the wrong direction.

In all these cases, again, there's a black frame.

How do you debug that?

What's the best way?

Well we found internally something really useful.

Let's just consider the types of problems that you may encounter[Pause].

First of all, and probably one of the worst ones, is leaving gaps between the segments in the video composition.

It will almost certainly result in black frames or just hanging onto the last frame.

And then just getting that alignment in the tracks wrong.

Maybe it's a rounding error when working with CMTime or something.

Or misalignment between the video composition time ranges and the AVComposition track segments.

Or a mistake in one of the ramps, too long for example.

Or an error in the transformation matrix and spinning off your layers into the ether beyond the boundaries of the output frame.

[Pause] As I've said we've found something really useful internally and that is to visualize the composition.

Rather than looking at the code, you look at the picture.

So let's switch back to the demo machine.

I think I see that, yeah, okay.

Alright so if I run, little lagging here, hopefully it doesn't crash.

Okay, so, we're looking at our simple wipe again.

Now let's see, this is AVCompositionDebugView.

This is going to be sample code.

I can display it like this.

And what were looking at here, get it big enough here, okay, so, what we're seeing here in green and dark blue are AVComposition tracks, and green is the video tracks, dark blue is the audio tracks.

And remember this is the cuts only part.

So our source media starts here for video track 2 and there for video track 1.

And then down here's the video mixing part, is the AVVideoComposition containing our instructions, in this case this instruction's made use of that passthroughTackID so we can label it as such.

And then into the custom instruction.

And then at the bottom is the audio mix.

Now you see the yellow volume ramp there.

So, this looks like it could be quite useful.

So let's put it to the test.

Let's give it a composition that's broken in some way.

So let's switch to our pre-prepared compositions.

Alright, I wasn't going to show you the debug viewer yet.

Let's play this guy.

It's going to dissolve into the deer, there's something visually wrong there, let's just play that slowly through.

If we dissolve away from the boat, the deer comes through and it jumps straight through that is not right.

Okay, so we've got something wrong here.

Rather look at the code, we'll see if we can spot it with this guy.

Okay, you can probably spot that straight away and you can see that for video instruction, the dissolve, we can see the opacity ramp descending all the way here.

So the video instruction starts here and ends there, so that's supposed to be our dissolve, which is going to take two source video tracks.

One, that starts here, it does last for the duration, but this guy is too short.

[Pause] So.

And I just happen to know where that problem is.

And your bugs won't be this easy.

I tried to obfuscate that, but [laughing].

[Pause] Okay.

Switch to the, hopefully now fixed, composition.

And there we see the first video track.

It looks good.

Now we get the dissolve.

Awesome. But there are worse ways for the composition to be broken than that.

In fact, let me just run that again.

Okay so let's have a look at the, the gap in the video instruction.

Black. Not good.

We'll have a look there.

And here, now, we can spot this guy straight away.

Down here we have a gap between the dissolve instructions and the last instruction, which isn't good.

So, that should be nice and easy to fix just like the last one.

Now this guy here.


[ Silence ]

Now the sea otters.

[ Silence ]

And there we go, job done, fixed.

Okay. Okay so that's AVCompositionDebugViewer, which is a sample for Mavericks and iOS 7.

Now we've made it simple so that it's particularly easy for you guys to drop it into your app and modify it.

Extend it to draw your own video instructions.

Now remember, it's just a noninteractive view, okay.

Now it's going to help you spot those overlaps and gaps as we saw, you know, in the tracks, the video instructions, and the audio.

But don't forget the validation API that we have in there, the AVVideoComposition ValidationHandling protocol.

So you want to implement that and then you'll get callbacks to see when we encounter something a bit untoward in your composition.

So what have you looked at today?

Well, we've spent a few minutes just reviewing the composition architecture, to see where the built-in compositor fits in [Pause].

And we've discovered the new custom compositing API with the two new protocols and the request object.

And we've see how important it is to choose the source pixel buffer formats and the output.

And we looked a little bit at tweening and interpolating your video mixing parameters across the time range of your effect.

And we talked a bit about performance, so remember those three booleans inside the instruction protocol, have a good look at them.

And lastly, we saw our sea otters again in debugging compositions.

So that's AVCompositionDebugViewer, which is a great visualization technique and much simpler than looking at reams of code.

So over the next few months, I want to see you guys drop in some great OpenGL visuals and I want to see them appearing in those video editing apps.

It's going to be, we're going to see awesome effects, transitions, and video generators[Pause].

So if you need some information, then do e-mail our evangelist, John Geleynse.

We'll have some related sessions that you can catch on video now at WWDC app.

Thank you very much.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US