Editing Movies in AV Foundation

Session 506 WWDC 2015

Learn how to use the new AVMutableMovie class to modify media files and simplify your editing workflows. See how to support segment-based editing and discover the power of sample reference movies.

Welcome to session 506.

My name is Tim Monroe, and I am a member of the AVFoundation engineering Team, and I want to talk to you today about editing movies in AVFoundation.

The takeaway message is very simple.

AVFoundation provides new classes for editing QuickTime movie files.

Oh, come on.

You can do better than that.

Let's hear some applause.

[Applause]

Thank you.

Now, I am excited about this capability because, as you'll see in a little bit, this provides the tools that I need to manage a personal project that I have been working on for about the last ten years, and in some sense, it brings to a culmination a project that I have been doing for about the last 35 years.

So I've got a lot of skin in this game, and I hope you'll enjoy the presentation today.

So what does it mean to say we can edit QuickTime movie files.

Well, we can open QuickTime movie files and perform range-based editing on movies and tracks.

So you select a section, a segment of a movie, and copy it into some other movie.

We can add and remove tracks.

We can set tracks associations between tracks so say to mark one track at a chapter track of another.

We can add or modify track and movie metadata.

And we can create movie files and what I am going to call URL sample reference movie files.

So how are we going to do this today?

First thing I am going to do is situate these new classes in respect to the existing classes we have in AVFoundation.

Then I am going to do sort of an API crawl where I go through and touch on the high points of the new methods we've added to these classes.

And finally, I am going to go back and talk about this personal project that I've mentioned and show you very clearly how these new capabilities allow me to do interesting things with this large amount of data that I've collected.

So let's talk about the AVFoundation editing classes.

If you've done any work with AVFoundation in the editing space, you know that we have two classes that you work with, the operative abstraction is that of a composition, and so you can have AVCompositions and at the mutable level, AVMutableCompositions.

Now, compositions are really cool stuff.

You can make some really exciting presentations.

There's just one pesky little drawback to AVMutableComposition, which is there's no standard file format into which you can serialize your composition.

So it makes it hard to interchange your snazzy composition with other applications.

Well, when we move to AVMovie and AVMutableMovie, that problem goes away because this is tied to the QuickTime movie file format.

We can now open these, edit them, and write them back.

At the track level, we have a similar setup.

As you know, a composition is composed of composition tracks, and at the mutable level, we have AVMutableComposition tracks.

In El Capitan, we are adding two new classes at the track level, AVMovieTrack and AVMutableMovieTrack.

So we have, really, five new classes that you have to worry about, movie and mutable movie, movie track and mutable movie track, and then this fifth class called AVMediaDataStorage.

That's a very simple class, and its whole job in life is to indicate where you want new sample data that you are writing into a file to end up.

So an AVMovie represents the data in an audiovisual file that conforms to the QuickTime movie file format.

Now, if you are familiar with QuickTime movies, you know that there are also a number of related file formats that were devised based on the QuickTime movie file format, and these are collectively called ISO base media file formats.

And AVMovie and mutable movie can handle these just as well as they can QuickTime movie files.

Now, what's the key feature about these files?

From my point of view, it's this, that the file format enforces a strict separation between sample data and the data that organizes that sample data into tracks and movies.

So here's a basic diagram of a QuickTime movie file.

It starts with what's called a file type box, and this is a very simple little bit of data that indicates which of the file formats in this family this file conforms to.

There's also a movie box which, as I just said, is the organizing information about all that sample data, which is found in its own box.

Now, the order of most of this is arbitrary.

The file type box has to come first, but the other boxes can come in pretty much any order.

So we could have the movie box at the end of the file, it could be situated between a couple of sample data boxes, there can even be parts of the file that are unused.

Maybe they used to contain data, but they don't anymore.

Now, what's in this movie box?

Well, there are three types of information that we are going to be concerned with here today.

First is what I like to call global settings.

This is information that without which you wouldn't really have a movie, things like how many tracks are in the movie, what's the duration of the movie, when was this file created, and what's the preferred rate that you would like this played back at.

There's also in a movie box some metadata, which is sort of optional data that isn't strictly speaking necessary for the playback of the media, but is useful to have.

And this would be stuff like a copyright statement, an indication of who the author, what the title is, and maybe even some custom metadata that you or some other app has written into this movie box.

The last thing that you will find in a movie box are track boxes, and these are pieces of information that define the tracks in the movie.

So they contain the track type, they point to the sample data that's needed for the track.

They also contain track metadata.

And again, you can write custom track metadata in there if you'd like.

And there's also information about track association, so how this track relates to other tracks in the movie.

So the key feature here is that the sample data location so the sample data itself is not contained in the track box.

The track box contains references to the sample data, sort of like this.

Now, it's possible for the sample data that a track refers to exist in another file altogether, so you can have external sample references.

It's even possible for the sample references to refer only to external sample data.

Now, when we have this situation, the little movie box and its file type box is called a sample reference movie file.

So there's no actual data in that box; it just points to data elsewhere.

Now, it turns out that this is an incredibly powerful tool to have available when you are working with media files.

Why is that?

Because I can edit that small little bit of information and never have to touch that huge amount of media data.

However, as you can imagine, there's a certain fragility involved here.

Why? Because I'm referring to data, and if the data that I am referring to happens to move or gets deleted, I am out of luck.

I can't play my presentation anymore.

Now, it's possible to reduce that fragility in a couple of ways.

Primarily by using relative URLs when I make these sample references, and we'll see how to do that in a little bit.

But at the end of the day, you are going to want to have what's called a self-contained movie.

If you want to give it to a friend to view or post it for delivery on the Web.

And when you need to do that, you can just run your mutable movie through an AVAssetExportSession to get a nice self-contained file.

Okay. So let's look at the movie editing API that we are introducing.

At the immutable level so AVMovie is an immutable subclass of AVAsset we can actually do some interesting things already.

We can inspect the movie.

We can get what's called the movie header.

And we can write this movie header into a new file.

By movie header, I just mean the file type box and the movie box together, none of the sample data.

So how do we initialize an AVMovie?

If you have initialized an AVRL, you know how to do it.

You give us a URL, and we give you back an AVMovie.

You can also, if you have a movie header, say, on the paste board, you can access that and create a movie from a block of data.

We are not actually going to talk about this anymore in this talk, but if you look at our sample code package called AVMovieEditor, you will see very nicely how you can put stuff and take stuff off of the paste board.

So here is how easy it is to create a sample reference movie file.

We'll open a URL as an AVMovie.

Sorry. Then we call the right an indication of the type of file we want to create, and then some options.

Now, in this case, the option that we've specified is truncate destination to movie header only.

Essentially, this means if there's any data already at that output URL, it's going to get blown away and what's going to end up in that file is simply the movie header.

Now, it's possible to pass in a different option and not blow away any data that might be in that file, so that would be the add movie header to destination option.

So let's move to the mutable subclass of AVMovie, AVMutableMovie.

This provides editing methods that aus to perform range-based movie editing, add and remove tracks, add or modify movie metadata.

So how do we initialize an AVMutableMovie?

Pretty much the same way we initialize an AVMovie.

We will pass it a URL, but in this case we need to be able toened had ale any errors that might occur.

It's also possible to create an AVMutableMovie from scratch.

So this line of code gives us back an empty AVMutableMovie that we are going to want to add tracks to and then some data to the tracks and so forth.

These are the segment editing methods available on AVMutableMovie.

If you have worked with AVMutable composition, this is familiar because they are, in fact, identical with one exception, which is the insert time range method takes an additional parameter that indicates whether you want to copy the sample data into the target or just copy sample references.

Now, if you copy sample data into the target mutable movie, you need to tell it where to write that sample data.

It does nothing by default.

It's not going to write it into the file that the movie box is located in, for instance.

So you have to explicitly specify where you want any new sample data that's written into this movie to be put.

And you do that by specifying the default media data storage property of the mutable movie.

And this is where this AVMediaDataStorage class comes into play.

Again, it just simply wraps a URL at this point.

There are methods for creating and removing tracks, and you see to create a track, we have to say what type of track we want.

Do we want a video track, do we want an audio track, and so forth.

And if we like, if we have an existing track that we want to model this new one on, we can pass it in as an as a parameter and say take the relevant properties from that existing track and use those.

So here's a simple little case study.

To update an existing movie file, we could open it up from a URL, do some edits using the methods we've just seen, and then we can use the right movie header to URL method to write the movie header back into that same file.

This is what's known as in-place editing.

So here we haven't moved any sample data.

All we've done is tweak the movie box and put it back into the same file.

So let's talk about the track editing API.

This allows us to modify tracks that are in a QuickTime movie file, and we have range-based track editing, just like we saw at the movie level.

We can also set associations between tracks, and we can add or modify track metadata.

So here are the segment editing methods, and they should look familiar because we've just seen them.

At the movie level.

The only difference being that instead of insert time range of asset, we have insert time range of track.

And again, when we do insert time range, we have to specify, do you want to copy the sample data, or do you just want to put in sample references.

And if you copy the sample data, the track has to know where that sample data is going to go, and we do that by setting the media data storage property of that track using exactly the same class we used at the movie level.

So here is another case study.

Suppose there's a little bit of a clip that I want to silence out.

Maybe there's some unfortunate language in that clip.

I could do it it quite simply by finding the first audio track in the movie; define a time range, which is the range I want to silence; remove that time range; and then insert an empty time range at that same spot.

The net effect of this code is to have an empty segment where there used to be some offensive language, say.

Working with track associations is very simple.

You can add a track association or remove a track association.

We will see this in use later.

And I said I would tell you how to use relative URLs when you copy sample data into your track, and you do it like this.

You open. For each track, we will set sample reference base URL property to some common parent of the movie that contains the movie file that contains the movie box and the files that contain all the media data.

All right.

So let's move on to this personal project that I talk about, and I want to call it A Study in Scarlet (and Gray) apologies to Conan Doyle.

Let's jump back in time to 1980.

What are we talking?

35 years ago.

At that time, I was a graduate student across the bay at Berkeley, and I needed a way to blow off some energy, so I went out and bought a pair of skates, and I discovered that Berkeley is a great place to be a skater because it's got these really nice hills.

And I really enjoyed that.

So I kept finding higher and higher hills to climb up and then go flying down.

So here in 1984, you can see me at the top of Claremont avenue in Oakland.

In the distance, you can see San Francisco, just this side of the bay, we have Emeryville, and we have this wonderful, almost looks like a ski run in front of me.

I am not quite sure what the hand up is doing.

Maybe it's the sort of we who are about to die salute you thing, and I didn't die.

It's okay.

[Laughter] Well, what I did in order to keep track of where I was, I went out and I bought a street map, and I got myself a yellow highlighter, and I started marking off on my map exactly where I had been, which hills were particularly good.

Well, if you are like me, that's a really bad thing to start doing because you're going to want to end up coloring in the entire map.

And so about 2000 sorry.

About 1985, I came up with a project that I called Tim's Radical Inline Skate Tour of Oakland, or TRISTO, and the goal was really simple, to skate every street in Oakland, the entire length of it.

This continued from 1985 until about 2005, so ten years ago, and I made it.

Okay?

[Applause]

Thank you.

Oakland has about 800 miles of roadways that you would have to skate in order to accomplish this.

Now, when I was done in 2005, I had no location data.

GPS hadn't been invented when I started, and it certainly wasn't available.

Actually, I did have location data because if you open up my map, you will see that it is right side up all yellow, not just but Alameda, Piedmont, Emeryville, the whole nine yards.

I also had no video.

If you remember back to that time, I mean, the camera you would be carrying in 1985 would be clunky and fragile, and you just wouldn't want to have it with you when you are out there skating.

So when I finished this, I really this is a wacko project, I admit that, and I made absolutely no effort to tell anybody about this.

Unfortunately, the San Francisco chronicle learned of this feat, I guess, and wrote a nice little newspaper article, one of these human interest stories.

The Oakland City Council learned about it, they passed a resolution praising the extraordinary physical accomplishment that's their words, not mine.

A couple of filmmakers in Oakland heard about it and spent literally years off and on making a nice little documentary about this project, which came out about a year ago and has been making the rounds of independent theaters in the Bay area.

Just completely wacko.

So about a couple of years after I finished this project, I started spending some time on the east coast, in particular in the area of Boston, Massachusetts, and even more particular, in Cambridge.

And I thought, all right, let's take this thing on the road.

I spent a few years skating Cambridge and Somerville, and then finally the idea for TRISTO Boston was born.

Same idea.

Skate every street in the city of Boston.

So this began about 2011.

I looked at a map, and I figured it would take me about five years to complete this.

Well, we had a couple mild winters in there.

It's about the same size as Oakland.

And I was able to complete this in about two and a half years, so in May 2013, I finished TRISTO Boston.

Now, this time I was prepared because as I was skating, I had one of those sports action cameras up on the top of my helmet, and I had a GPS app for my iPhone.

And so when I ended up, I had 490 MPEG-4 files straight off the camera covering about 200 different skates for a total of about a terabyte and a half of data.

I also had GPS data in the form of GPX files that's sort of an XML version of location data we will see one later for about 150 megabytes worth of data.

So just to get an idea of what these movies look like, let's just look at one.

So these are movies straight off the camera.

The only thing I have done is rename them so that it's got the date and title making it easy for me to find.

So we will open these up in QuickTime Player.

There's the second one, so that continues at the end of the first one, so we will select the first movie here.

We can see it's a bright, sunny day.

It's late November.

It's a weekday.

And let's just roll this.

We see it's the fall because there's leaves on the ground.

If we had the audio, we would hear that it's actually quite windy on this day.

And there's something strange here, though.

There's nobody on the street.

There's no pedestrians.

Let's jump ahead a little bit again.

No pedestrians.

There's some cars.

Here I am at Longwood Medical District.

This is a heavily trafficked area.

Nobody there.

Here's a commercial district.

Nobody. It's like a ghost town.

So what's going on?

Well, we'll solve that problem later.

So what I've got, basically, is a hay stack and a bunch of needles.

I've got 500 hours of video.

Even if 99% of it is junk, that's still five full hours of interesting video that I'd like to find.

Part of the problem, though, is I don't even know what the needles look like because I don't know what's important in these 500 hours of video.

Well, I do know some things that I'd really love to look at, and here is sort of a montage of some of these.

Snow, really?

I mean, come on.

Like, I didn't see that pothole?

[Laughter]

Now, that one's my favorite, because as you can see, I go down, I kind of roll over, I get back and keep on skating.

You know, if it wasn't as thank you.

[Applause]

Now, I actually had to pay somebody to weed to look through about 1/50 of all of my video to get these six clips.

So in theory, there's 300 of these wonderful falls somewhere in that hay stack.

So at about, you know, five seconds per fall, we are talking 25 minutes of this stuff, which would be really fun to watch.

[Laughter]

So here's what I want to do with all this data.

I've got a terabyte and a half of video data that I want to somehow do something with.

I want to find those needles.

So the first step is going to be to combine the various files that came off the camera for one skate into a sample reference movie file so that I can just open it and edit it and not have to worry about the camera files.

The second thing I want to do is add indexing metadata as movie metadata.

So I want to go in and mark in the file, okay, this is where I start to fall, and this is the end of the fall.

Or this is where dogs are chasing me, and this is where they finally figured out they are not going to get me.

Or this is where I am talking to a law enforcement officer, and that just keeps going.

So I'd like to index all this stuff so that it's easy to search and find all these needles, as I call them.

I want to add GPS data.

I have all that location data I haven't done anything with.

I want to add it to the file, and I am going to do that as what's called a timed metadata track.

And the key is I want to do all of this without modifying any of the original camera files, and by minimizing any data copying that I need to do.

Okay. How are we going to do that?

Well, AVMovie and AVMutableMovie give us the answer.

The solution to the first issue is very simple.

I am going to take those camera files, and I am going to create a sample reference movie file that points to the data in those two original files.

Very straightforward, standard use of sample reference movie files.

The solution for the second step is just to add a little bit of extra custom metadata to my movie box.

The third step is to add some actual media data into my new sample reference movie file.

And that media data, even though it's called timed metadata, it's stored as media data or sample data, and it just goes into a new track in the file.

So that's the way I want to approach this issue.

So the first step is actually quite simple.

We'll open the first movie as a mutable movie, and we'll open the second and any subsequent movies as just URL assets.

We'll define the range in the asset that we want to insert, and that's just the entire asset duration.

We will actually insert it into the mutable movie at the end of the mutable movie, and notice we say don't copy the sample data.

I just want to end up with sample references.

Finally, all we have to do is write a new movie header out to a new file, and we have then created a sample reference movie file that points to the data in the two original camera files.

So let's look at some demos of how I solve the second and third one.

Normally this slide would just read demos, but I am a sucker for a good palindrome, and I just couldn't waste it.

Alvin is a sort of acronym for AVFoundation-based linear indexer.

Its job is to do the indexing I talked about, but also get the location data into my file.

This is what the UI looks like.

I've got a movie view on the left, I've got a map view on the right, and down below you can see I've got some display areas where I am going to be displaying information that I glean about this skate.

So let's go ahead and open up a movie file.

And this is the sample reference movie file that I just created.

That opens up there.

I am going to load a GPX file.

I will go out and select the appropriate GPS file.

Now, when I open this file, some of the data's going to be filled in.

It's going to figure out how long this was.

Okay? So let's I've got it so I can expand my map view to make it clear where we are.

So I went out and made a big circuit around Brookline.

This situates you.

There is Logan Airport.

Here is Boston proper.

There's Cambridge.

Now let's just zoom back in.

Oh, I said I was going to show you a GPX file.

So this is what a GPX file looks like.

It's just XML formatted, longitudes, latitudes, elevations, time stamps.

That's it, nothing more in there.

So let's go here and run a Web service.

I am going to go out and find out from a government Web server what the weather was like on this day, and I discover it was quite windy, it was 17 mile an hour average, or for those who speak metric, 28.4, and it was 39 degrees Fahrenheit or, interesting factoid, 3.9 centigrade.

Thanksgiving day, that's why there's nobody on the street.

You see, it figured out this this was a holiday.

It's also the second night of Hanukkah.

Okay? So just these Web Services have allowed me to deduce why there's nobody on the street.

Now I am running another Web Service called reverse geocoding, where I am sending each longitude and latitude out to a server at Apple, and it's going to hand me back a state, city, street, and sometimes even a street number.

Basically, it's going to give me information about the streets that I was on.

So we will wait here while all these street pins fall down.

Then we can select one of them to find out what street that is, so that's Commonwealth Avenue.

Or here's Washington Street in Brookline.

Now, I added a search bar so I could type in, say, Commonwealth Avenue, and it will highlight it when it finds it.

So if I put this data into my file, now I have complete searchability on street names.

I can go in and find all those little clips where I am on a particular street.

So let's hide those, and the last thing we need to do is synchronize our video with our location data, and I tried various automatic ways to do this, but I discovered the only really good way to do it was to go to a frame in the video where I knew where I was on the map, and then I am going to come over onto the map and click it and say synchronize the video to this location right here.

Now, when I do this, if I come back and start playing the video, you will see the pin moves along with the video.

See, these are now synchronized.

And I can even go the other direction.

I can go to someplace on the map, and I can click it and say make the movie go there, and now I am on Commonwealth Avenue, and I know this is a particularly fun downhill slope, and we will just let this ride for a second.

And this is really why we do this, right, for these kind of downhill with no cars and what could possibly go wrong?

[Laughter]

Okay. So that's Alvin.

You can see that Alvin has solved the second and third problems for me.

It's added custom metadata to my sample reference movie file.

And it basically does it like this.

It gets the existing metadata by calling the looking at the metadata property.

We'll create a new metadata item, we'll fill in its properties accordingly.

For instance, the average wind speed value is the value I got from the government server telling me what the weather was on that day.

And I just create some identifier here that I can look for later, and once I've written this into the file with the right movie header to URL, my movie is all updated.

Now let's look at step three, which is to add the location data as a time metadata track.

You already know how to do this.

If you were here last year and heard the talk Harnessing Metadata in Audiovisual Media, they showed a couple things.

First how to get location data from a capture session, and how to right-time metadata into a movie.

Well, we've got our location data.

It's in our GPX file.

So we don't need that capture session stuff.

We can just focus on taking the source code in that second sample code package and putting it into Alvin.

Now, once we've done that, what we end up with on disk is a timed file that has a single metadata track in it, and all we have to do is copy that track into a new track in our movie, and we will do it by opening the GPS asset.

We'll then add a new track to our movie using whatever media type happens to be in that track, and we will copy the settings from that track.

Finally, we insert that range into this new track, and again, if we write this header into our file, we are done.

Now, the last thing we need to do this is optional is create a track association between the timed media track and whatever track it is metadata for, and in this case, we have chosen the video track.

If you leave this step out, the timed video track is assumed to apply to the file as a whole, and we are fine as far as that's concerned.

If you are working with AVMutableMovie, there are a couple of things to keep in mind.

As you know, an AVMovie or an AVMutableMovie is an AVAsset because it's a subclass.

So anything that you can do with an AVAsset you can do with a mutable movie.

For instance, you could play it using AVPlayerItem, you could grab an image from it using an AVAssetImageGenerator.

You could export it using an AVAssetExportSession.

Now, if you do any of these operations on a mutable movie, we highly recommend that you make a copy of that mutable movie because if you're changing the movie while you are exporting it, bad things will happen.

So just make a copy of it, do your export, and everything works fine.

The second best practice is that if you are opening an asset that you want to insert into an AVMutableMovie, you should set this flag, AV URL asset prefer precise duration and timing key, to true.

Now, we haven't done this in any of the code snippets that you've seen so far because when you open a QuickTime movie, this flag is already set to true.

But if you were to open an MP3 file, for instance, and wanted to insert that into a mutable movie, you would want to set this flag.

So let's summarize.

We've added new classes to support editing on QuickTime movie files.

This gives us, as I hope you've seen, a really nice, simplified workflow.

We are able to work on reference movies and not on the original movies themselves.

So if you are handling large amounts of data, this is for you.

If you want to see how this works in practice, download AVMovieEditor and take a look at that.

For more information, go to the standard places.

And if you have questions about this, we'll be in the lab on Thursday and Friday.

If you want to look at the how to work with compositions, jump back to 2010.

And finally, if you want to see in great detail how to do the timed metadata track generation, go to last year's session, Harnessing Metadata in Audiovisual media.

Thank you very much.

[Applause]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US