Discovering AV Foundation

Session 405 WWDC 2010

The AV Foundation framework provides a rich Objective-C interface for recording and playing audio and video in your iPhone OS application. In iOS 4, the framework includes significant new foundation classes for working with media. Learn how to use these new classes for media playback, editing, capture, and export.

Kevin Calhoun: Hello, and welcome to Session 405, "Discovering AVFoundation."

Now and for the next sixty minutes you will join me on a voyage of discovery.

My name is Kevin Calhoun of Apple's Media Systems Engineering Group, and we'll be discovering the vastly expanded AVFoundation framework in iOS 4.

We'll talk about why you would want to use this framework and for what purpose; we'll talk in some depth about concepts underlying the use of Time Media that inform the design of the APIs we'll be discussing in which you'll want to be familiar with to aid in your adoption of the APIs and; of course, we'll talk specifically about tasks that you can accomplish with this API set.

Now, where are we going in our voyage of discovery this afternoon?

We are going beneath the blue line.

Underneath Media Player, underneath UIKit where the pressures are and the media takes time, we are at the level of AVFoundation, the Objective C framework which gives you a great degree of control over Time Media.

We sit on top of frameworks that are familiar to you such as Core Animation and Core Audio and also a framework which is newly public in iOS 4, Core Media.

So that's where we're talking about in the system today underneath the level of UI.

Now there have been technologies on iPhone in the past dating back to the original release and enhanced through iPhone OS 3.0 and even after that are useful for Time Media operations.

There are several very easy ways that you can use Time Media in frameworks that sit atop the level of AVFoundation.

For example, for playback the MediaPlayer framework offers MPMoviePlayerController and MPMoviePlayerViewController well integrated with UIKit successfully used by nearly every app on the platform that currently plays Time Media.

In addition starting with iPhone OS 3.0, the browser has offered support for the HTML5 video and audio tags.

So if you have web-based content that you want to integrate Time Media with, that's a great solution for you.

Now I mentioned that AVFoundation is vastly expanded in iOS 4.

It initially was shipped with iPhone OS 3.0 and in that version it offered a class called AVAudioPlayer, which is useful for playing audio files.

So, those are very easy ways to play Time Media and may still be appropriate for your apps even now with the release of iOS 4.

Similarly for Capture there are solutions already extant on the platform.

UIImagePickerController in UIKit supports video capture as well as still image capture and AVFoundation starting iPhone S3.0 supports AudioRecorder for recording audio files.

So that's great stuff to be aware of and may be appropriate for your app even after we survey AVFoundation together.

So, the question is, why would you use AVFoundation and iOS 4 if these other great technologies are available?

The basic answer is we give you a much larger measure of control over Time Media in AVFoundation and a lot more features as well.

In particular if you need to inspect the contents of a Time Media resource, you can get deep information.

You can glean what media is available, what metadata is present in a Time Media resource.

If you need to play Time Media in ways that are more sophisticated than you can with the other frameworks, in particular, if you want to implement a totally custom user interface to control play back, you can do that with this API set.

In addition if you wish to pull together media from multiple sources such as iMovie does pull some video from this resource, some audio from another resource, arrange it temporary, perform editing operations, this is the API set for you.

If you want to take existing media resources either simple ones such as simple Time Media files or complex resources such as compositions that you've edited together and re-encode them in order to create new Time Media resources, you can do that through this API set.

And, finally, if you want full control over the input devices that are present including the camera, this API set has the features for you.

So these are the five areas of functionality that we're going to be talking about in this session and the two subsequent sessions so keep these five in mind.

There's a bargain we're going to strike.

We give you in this framework this vastly greater degree of control over Time Media, but as you accept that power, that ability to control Time Media, you also accept the responsibility for handling Time Media in ways that are appropriate for this type of content.

So it's important to be aware of the challenges of working with Time Media as you adopt these APIs.

The most important point to make about Time Media is an obvious one that it's intended to be processed, intended to be consumed incrementally.

You see a sequence of video frames or hear a sequence of audio samples.

Time Media takes time and that point is fundamental to the design of the APIs we're going to be talking about today.

It will be necessary for you to code some forbearance into your apps because Time Media takes time to process, not just to play but also to perform other operations as well even operations as simple as inspection.

You might not be surprised to learn that because Time Media is intended to be represented over a period of time some of the formats in which it's delivered are not optimized to provide summary information about the Time Media resource as a whole and as a result in order to get information about these resources, for example, just asking a simple question like ("what's the duration of a resource?") you may require a good deal of work to be done on your behalf to deliver the answer.

So, Time Media takes time, but it doesn't monopolize the device while it's performing an operation.

You wish your apps to remain responsive to your end user while these things are going on.

So it's possible for the user to turn his or her attention to some other task or for the device to handle an event such as an incoming phone call that changes the circumstances of operation that you are enjoying at the time that you kicked off the operation that you're currently performing, playback or re-encoding, for example.

So it's necessary for the APIs to give you the opportunity to respond to these changes in circumstance.

Finally the last important point to make about Time Media is that the variety of formats, the variety of delivery protocols is great and if you wish to handle Time Media resources uniformly, it may be necessary for app to do a little bit of additional work to take some extra steps in order to provide a uniform processing path for the operations you have in mind.

We'll get back to details about this when we discuss playback a little bit later in the hour.

Okay, but that's what to keep in mind.

The five areas of functionality we offer in AVFoundation and the challenges associated with Time Media that inform the API design.

Let me give you an overview first before we go into detail of the classes that we are making available to you in AVFoundation in iOS 4.

I mention AVFoundation first shift and iPhone OS 3.0 and with that version of the framework there were audio-related classes very useful, you'll still want to use these today.

AVAudioSession is present.

This is the class that you use in order to inform the underlying audio system on the platform of the type of audio processing that you're performing.

If you tell the audio subsystem what you're doing, then it can arbitrate resources appropriately for what your app is doing and what else is going on in the system.

So, you're going to use this class in connection with AVFoundation even in 4.0.

Other classes in the audio category present in AVFoundation mentioned earlier AVAudioPlayer and AVAudioRecorder, but now the expansion, the annex, the annex, which is quite large in iOS 4 in AVFoundation the five areas of functionality we mentioned earlier.

First, inspection.

In order to provide uniform inspection of Time Media resources, we offer a model object known as AVAsset, Audio Visual Asset, that allows you to get information about the contents of a resource.

Assets can contain multiple streams of media each of which is represented by an AVAssetTrack.

They also can contain collections of metadata that we make available to you as a collection of AVMetadataItems; these are the basic inspection classes, but remember the challenge I mentioned earlier it can take time to provide you with information about a resource and so these classes implement a protocol known as AVAsynchronousKeyValueLoading that extends key value coding to allow you to request the value of a property to be loaded on demand.

We'll talk in great detail about how that works shortly.

In addition, we allow you to represent resources as still images, thumbnails, other types of visual previews, you would create those thumbnails and other still image previews by means of AVAssetImageGenerator.

Second area of functionality I mentioned is playback.

Obviously to play Time Media, you require a class, a controller class, that has basic play and pause types of methods.

That class and this framework is known as AVPlayer, Audiovisual Player.

A strangely apt name if I do say so myself.

AVPlayer plays AVPlayer items.

Note that an asset is merely representation of the contents of a Time Media resource, but does not itself carry presentation state.

The presentation state for an asset is carried by AVPlayerItem and similarly the presentation state of any asset track is carried by AVPlayerItemTrack.

Now, at this level we do not have UI affordances, no views, but it would be kind of pointless for us to allow you to play a video with no way to display it to the end user so what we do have is a subclass of CA Layer, Core Animation Layer, known as AVPlayerLayer, which is capable of displaying the visual output of a player.

Core Animation, of course, is not only useful for visual display, it's also useful for timing, for animations that are timed and so we also offer in our little bag of tricks another subclass of CA Layer known as AVSynchronizedLayer, which is capable of synchronizing a layer sub tree with a playback of an item.

Very useful for synchronization.

The third area of functionality, editing.

I mentioned earlier that we have this wonderful asset model that we use uniformly.

It would be very nice to be able to extend that asset model in order to describe the composition of media from multiple sources, multiple URLs.

If I can do that with an asset, like any other asset I would like that composition to have multiple tracks, perhaps multiple video tracks, multiple audio tracks, and I would call those AVCompositionTracks and they would extend the track model by allowing me to describe the sequence of media from multiple sources that a particular track can display, but it's not sufficient just to be able to describe these compositions.

I want to be able to create them.

So, it would be nice to have mutable subclasses if AVComposition and AVCompositionTrack that have methods for the insertion of media and other editing operations and that should give me enough to be able to do temporal composition.

It isn't quite enough to describe how to display a temporal composition so to that we'll add a couple of additional classes.

If I have multiple audio sources in my composition, I want to describe the way they are mixed together.

I might want to set their volumes relative to each other or I might want to control the ramping of audio from one source down while another ramps up.

The ability to describe that could be available in a class known as AVAudioMix.

Similarly, we can offer a means to describe how video should be composed together.

What's the front to back ordering of my video sources?

What's the opacity of each of the layers?

Perhaps I can ramp the opacity, fade one down and another one up.

That's AVVideoComposition.

So, I have the ability to pull media together from multiple sources and edit it.

I would like to create new assets from existing ones either complex asset like a composition or a simple one.

Start with an asset, create an object known as an AVExportSession that manages the process of export, which is going to take time so this is a controller class that I can kick off describing the type of export that I want to perform by means of one or another preset that's available in the framework.

Once I have the export session configured, I can run it, create the new asset, have it written out to an output URL, and be told when the job is done.

Last area of functionality in our high-level tour of AVFoundation is pertaining to capture, input devices.

I want to be able to survey the input devices that are available in my current context.

For that purpose, we can offer the AVCaptureDevice class.

You can enumerate the capture devices that are available by type, the audio devices, the cameras and so forth, and you can find out what features they offer as well to determine what features you should make available in your app.

Once you've chosen a capture device to use, you would like to set up a capture session with the inputs to the capture sessions where is the media coming from and to specify the outputs of the session, where is the media going, in order for you to process media coming off of the input devices and it's helpful to have the option either to process the media in your application, maybe you actually want to examine and process video frames you should be able to do that with or without the option of recording that media to a file.

Finally, in order to allow your end user to know where the camera is pointing in your application, it would be helpful to have yet another subclass of CA Layer, AVCaptureVideoPreviewLayer we might choose to call it if we're long winded, that allows you to display to your user a preview of what the camera is currently looking at.

So, there you go.

We've just designed the AVFoundation Framework.

Thank you for coming.

[ Applause ]

I will be splitting my proceeds from this job with you all equally.

Sign up in the lobby.

All right so one thing I need to mention before we go into greater detail about the use of these APIs I want to point out the fundamental framework that we depend on in AVFoundation, Core Media.

Down at this very deep level the pressures are so great that times are represented as a rational value.

That's pressure.

The Core Media Framework that underlies AVFoundation defines a number of primitives that you'll find as you survey the AVFoundation APIs.

Essentially anything that starts, any type that starts with a CM is defined in the Core Media Framework.

The one that I want to point out to you now is a representation of time, which I mentioned is a rational value known as CMTime.

Have a look at the header file in CoreMediaCMTime.H to survey the means for creating CM Times for performing arithmetic operations on them, for comparing them and so forth.

Similarly, CMTimeRange is another data structure in Core Media that you'll want to be familiar with.

The speakers who follow me this afternoon will be highlighting more of the details of Core Media that you'll need to become familiar with as you adopt the API set.

That's a good start.

So let's rise up from that very great depth so that we can all breathe more easily, me at least, and talk in detail for the remainder of our hour about two of the areas of functionality that we offer.

We'll talk about inspection in AVFoundation, how you find out about Time Media resources, and we'll talk about playback, how you play them and that will carry us through to the end of the hour and the remaining three areas of functionality will be covered over the remainder of the afternoon.

So, how do you inspect?

Well, obviously you want one of these AVAsset things.

That's the model object.

An AVAsset is the model for time-based resources that we use uniformly that provides information about assets as a whole.

What's the asset's duration?

Also, you can provide presentation hints though it doesn't carry presentation state itself, that's the job of APPlayerItem, AVAsset does carry information about the way an asset likes to be displayed, for example, what's its natural size?

Note that an asset is not constrained in any way in the number and type of sequences of media that it can present.

It can present one or more streams of audio, one or more streams of video, and we design it this way so that we can apply this uniform model to any number, any number of the variety of media formats that we support; the audio only ones, the video only ones and so forth.

Here on the slide are some examples of formats that we can represent by AVAsset and in addition, a couple of other objects available in the OS that can work together with AVAsset.

Objects from the MediaPlayer framework, for example, and from the AssetsLibrary.framework.

I'll give you details about that a little later.

Now, if AVAsset can contain multiple sequences of media data how we represent them each one is an instance of AVAssetTrack.

Each track represents a sequence of uniform type.

A track will be all video or all audio, for example, and each track not only will have its own uniform type, it will also have its own set of format descriptions.

The format descriptions tell you about the encoding of the media; is it H264 video, for example, or something else.

It's possible for there to be more than one format description represented in a single track so the coding does not need to be uniform across the whole thing.

A track also has a timeline expressed in terms of the timeline of its parent asset.

A track doesn't need to start at time zero if its parent asset nor does it need to play all the way through to the duration of its parent asset, it can occupy any segment of the parent asset that's convenient for offering.

Other information about tracks available via AVAssetTrack.

Now a typical asset will contain a single audio track and video track that are synchronized with each other, but as mentioned before, there's really no constraint.

Any number of tracks is possible.

Some of you are going to immediately rush out and author assets with say seventeen closed-captioned tracks just to prove a point, and I say go ahead; the model will support it.

What use you might put it to other than as a conversation piece to be discussed.

Now, let's go through some specific workflow examples, some code, some pseudocode that you might wish to write to inspect assets and here in the next several slides I intend to give you the basic flavor of what it's like to work with this framework.

Remember I mentioned that you need to code some forbearance into your apps.

Why? Because Time Media takes time.

That's what I want you to take away from this session.

I'll give you a very concrete example of what that means in just a minute.

To inspect an asset I will start by initializing an AVURLAsset object, a concrete subclass of AVAsset that presents the Time Media model for any asset that can be reference by URL, but note just because I have that instance of AVURLAsset in hand does not mean that any work has been done on my behalf.

Remember Time Media takes time, which has a corollary applied to AVFoundation that initialization of an object in this framework does not guarantee suitability or readiness for any particular purpose.

Specifically once you have initialized an AVURLAsset what is it ready to do?

The answer is nothing yet.

We have not examined the resource at all.

We have not even attempted to find the host.

Initialization of an AVURLAsset from any URL will always succeed.

So how do you find out what you need to know, how you get us to do some work on your behalf?

You use the AVAsynchronousKeyValueLoading protocol that I mentioned earlier in order to tell the framework, in order to tell the asset, which values for its keys, its properties that you wish to have loaded on your behalf.

This protocol has two methods.

First of all in order to find out whether any particular value for a key such as duration or tracks is already available use the method, statusOfValueForKeyError and this method will tell you whether the information you seek has already been loaded on your behalf.

At initialization times since we've done no work, the status for virtually all the keys of AVAsset and AVAssetTrack will be unknown.

You have to request the loading of a particular key in order for the status to change to loading and subsequently to arrive at one of the three terminal statuses.

Ideally, will arrive at the status loaded, which tells you that, okay, now you can call the getter and get the value that you wish, the array of tracks, the CM time for the duration, et cetera, but it's also possible for loading to fail.

Remember we're doing nothing to vet the URL when you initialize the AVURLAsset.

It's only after you've requested some loading that we may arrive at the decision that, well, the URL you reference is not a Time Media resource at all or, oh, by the way the network's down.

That failure will occur as a result of a request to load.

All right so with no more suspense how do you request the loading of a value, a key, on one of these classes?

A value for a property, a declared property in these classes.

Use the other method in the protocol.

Load values, plural, asynchronously, adverb, and you have no idea how hard it was to work in an adverb into an API name, four keys.

The idea here is that you decide what collection of values you require for the operation that you're performing.

Put them all together into an array, all of the strings representing the names of the keys you want and present them all at once to the asset via this method and then the asset will in turn do the work that's necessary in order to do the loading.

When all of the keys in the collection have reached a terminal status, the block that you pass to load values asynchronously for keys will be invoked and at that point you can test their status and then move on to do the appropriate thing.

What does it look like in a code example?

Here's how you put it all together.

The first thing you do with a URL is to initialize an instance of AVURLAsset, which at that point is not ready to tell you anything.

It needs to do work.

Then you would say here's the array of keys that I require to be loaded, their values to be loaded, in order to perform the operation I'm interested in.

In this particular case, I'm going to prepare an asset for playback and what I need to load in order to play something back is its array of tracks.

So, I'm going to tell the asset please load your tracks key by invoking LoadValuesAsynchronouslyForKeys with the array that in this case contains just the one key, and I'll supply a block that I wish to be called the net loading is complete.

You can tell this is a block because it starts with the funny hat.

This particular block that you pass to this method takes no parameters so the code that's executed when loading completes is all inside the braces there.

The first thing that it does is that it checks the status for the key I wish to have loaded; and according to the terminal state that was reached if it's loaded, I want to update my user interface with this tracks information; if it failed to load, I wish to report the error to the end user; or if I've canceled the loading of the values for key zone and asset, I want to do some bookkeeping.

So, that's basically what it would look like and this is the way that you prepare assets for operations in your app, this is the means by which you code the forbearance for the time that Time Media takes.

So, let me review this really quickly.

How do you inspect and load assets in order to prepare them for use?

You have information you want to find out.

You know that it may take time.

I'll give you a concrete example.

Something as innocuous as an MP3 file can take an enormous time in order to deliver just a very simple piece of information about the duration of an MP3 file may require the parsing of every single audio packet in the file in order to calculate.

It's not necessarily the case, in other words, that MP3 files contain summary information about their contents.

You do not want your user to have to wait while that work goes on.

You can ask for the work to be done synchronously, but there are significant downsides to doing that.

First of all, you risk having your app become unresponsive to the user, which of course, is a total no-no.

Users expect apps to respond to their control, but there's an even worse consequence that's possible if you request these pieces of information to be loaded synchronously.

There's a watch dog on iOS 4 that watches interaction with the Media Services available on the platform.

If any one of the clients of Media Services request an operation to be performed synchronously and that operation takes longer than a timeout value that that watchdog manages, the watchdog will come along and kill the application that took all that time and this may also have the side effect of causing media services that are in use by all of the entities on the system to be reset.

Don't let this happen to your app.

What will this result in?

Well, fewer stars in the app store that's for sure.

How do you avoid this calamity?

Come to www.C210 meet me in Presidio, and I will tell you all about how to use the AV, oh, wait, I'm having one of those time shift things, right?

Sorry. Use the protocol just described, load the values for T's asynchronously and you will be good.

Now, this duration thing, this troublesome duration thing.

I told you how expensive it is sometimes to calculate the duration of an MP3 file.

Even if you do it right and you load that value asynchronously you're sitting there saying I don't really need you to do all of that work on my behalf, I just want to play the darned thing, I don't need you to find out the duration exactly in advance.

Well, yes, that's actually true.

It turns out that for duration, which is a special value defined by AVAsset, it's usually sufficient particularly in playback scenarios to use an estimated value.

You don't need to know exactly how long something takes unless you're trying to coordinate something else with its playback, which is not typically the case.

So, by default the behavior of AVURLAsset is to provide enough accuracy for a playback scenario.

Note that if the underlying format that stores the media offers summary information about the timing and duration of the resource, the information that we provide you will be completely accurate.

For example, if the file is a QuickTime movie file or an MPEG4 file, those things do contain that summary information, and we will give it to you and you can find out if any particular instance of AVURLAsset provides precise duration and timing by examining its eponymous key, but if you require precise duration and timing from every asset that you're working with, if you're doing something that requires that degree of precision, you can request it at initialization time by setting in the options dictionary when you initialize the AVURLAsset the key AVURLAssetPreferPreciseDurationAndTimingKey and we will give you an instance of AVURLAsset that will be accurate regardless of cost.

Okay. So that is the fundamental interaction that you will have with this framework, and you'll note as we discuss these classes this afternoon that there are similar stages of operation with each of the chief classes.

You initialize something, you prepare it for the purpose that you want to use it for, you observe its status in order to determine whether it's ready for that purpose then you move on.

Playback not surprisingly is very similar in behavior to inspection.

I mentioned earlier that the chief class for controlling playback is AVPlayer.

It has the methods on it for controlling rate and so forth that you would expect in such a class, but beyond control it's extremely rich in the facilities that it provides to allow you to observe the presentation state, the playback state, as it changes so that you can synchronize a UI to playback state, for example.

I mentioned that the AVPlayer plays items, it has a property known as its current item so you can find out what it's playing at any given time, and the AVPlayer item as I mentioned earlier confers presentation state upon an asset.

It describes how an asset should be presented so it's possible to play an asset with more than one player item.

For example, in one context you may wish to play a particular time segment of an asset with one instance of AVPlayer item and in another context play a different AVPlayer item associated with the same asset to play a different time range of interest.

You can initialize an AVPlayer item with an existing asset that you have or directly from a URL.

If you initialize it with a URL, the AVPlayer item will prepare the instance of AVAsset for you that you can use for inspection.

AVPlayer item is also the class that you use in order to control time as playback progresses.

This is what you'd use to seek, for example, or to step.

In addition, AVPlayerItems have one or more AVPlayerItem tracks that correspond with the tracks of the asset you're playing and those have presentation state as well.

In particular, whether a track is enabled for playback or not.

So, having given you the overview of the player related classes, let's have a look at how you would code up preparation for playback in your app.

One of the challenges I mentioned right at the start of our talk is that it's difficult or it may require a little extra work in order for you to treat assets uniformly because of differences particularly in delivery protocol.

There are two main classes of assets that you need to be aware of for playback.

The first one is a file-based asset.

Essentially an asset that we have random access to in order to read information out of its container.

The second one is a stream-based asset.

We have a lesser degree of control over what we can read out of that asset.

It's essentially being beamed to us by a server.

There's a little bit of a difference in the way that you set up playback of an asset depending on whether it's filed based or stream based.

So, let's talk about the workflows for each and then talk about what you would do if you don't know what type you have.

To start with if you have a file-based asset, in other words, something like a video from the camera role that was shot with the camera or an item from the iPod library that you can access by the media library framework or even a file that resides in a remote HTTP server.

Here's the workflow that you would use to playback any one of those file-based assets.

As I mentioned earlier, initialize an instance of AVURLAsset with the asset of interest and then it is your responsibility in preparing that asset for playback to load its tracks.

The player is going to want to know what media in there so go ahead and do the job using the AVAsychronousKeyValueLoading protocol to load the tracks of that asset.

If that succeeds, then you can go on to initialize an AVPlayerItem with that asset and remember our correlary to our basic tenet, that Time Media takes time, that initialization of an object does not guarantee readiness or suitability for any particular purpose, I've just initialized an AVPlayerItem but when initialization completes, it is not yet ready to play.

What you'll want to do once you've created an AVPlayerItem is observe its status key by a key value observing.

Observing a status key you can be informed of when the PlayerItem becomes ready to play and you initiate the process by which it becomes ready to play by associating the AVPlayerItem with an AVPlayer.

In this example, I'm initializing the AVPlayer with the AVPlayerItem I created.

That kicks off the process to prepare all of the chutes and ladders to get the thing ready to play and via key value observing you'll soon discover that the status of the player item has changed to ready to play.

When that occurs, then it's possible for you to survey the presentation state of the player item where tracks are enabled, for example, to choose a track in this particular case to be disabled for playback, as an example, other customization of presentation state is also possible, of course, but once you've prepared the item for playback with whatever customization that you desire, then go ahead and tell the AVPlayer to play and that's essentially the workflow that you follow to play a file-based asset.

The second example as I mentioned earlier would be for stream-based assets.

As you know, iOS 4 supports the HTTP live stream protocol and HTTP live streams are essentially a playback only technology.

It is not possible for you to create an AVURLAsset from an HTPP live stream URL from scratch.

What you need to do if you fall into this category you have an HTTP live stream that you wish to play is go directly to the player related classes.

Start with an AVPlayerItem, initialize it with the URL for your HTTP live stream, do not spill water on the hardware, you just weren't aware of the safety tips you're going to receive at this session.

I'm glad you're enjoying them.

Associate the player item with an AVPlayer in order to start the process of making that PlayerItem ready to play.

Then a little magic happens.

Once that PlayerItem becomes ready to play, the AVPlayerItem will create on your behalf an AVAsset that you can use to inspect the contexts of that HTTP live stream so you can find out what tracks are present.

For example, and if necessary, you can customize the presentation state as well, but presuming that you just want to play it then, of course, you can move on once it's ready to play to tell the AVPlayer to play.

As a side note, it is possible for you to take a shortcut for HTTP live streams and simply initialize an instance of AVPlayer with a URL that you wish to play and the AVPlayer will create on your behalf the AVPlayerItem and the whole chain of events will be kicked off for you.

So, if you know that you're playing HTTP live streams, you can take this shortcut, but here's how we put it all together.

If you don't know in advance the type of resource that you wish to play could be a file-based asset, it could be a stream-based one, here's what we recommend.

Essentially, you have to concatenate the two workflows, and we recommend that you start with the file-based workflow.

Try the URL as a file-based asset.

Create an AVURLAsset from that URL with that URL, attempt to load its tracks key as described previously.

If that succeeds, move on with the file-based playback scenario.

If it fails, it's possible that URL is to a valid HTTP live stream.

So, try it then by initializing an AVPlayerItem with the URL and move on with the stream-based workflow as mentioned earlier.

The two code paths converge, and you can treat them uniformly once the Player Item becomes ready to play.

All right so now you're preparing items for playback, you've told the AVPlayerItem to play, how does your app stay in sync with time and control time?

First of all AVPlayerItem as I mentioned earlier is the class that provides control over time.

Could I have a mic down for a second, please?

I have a little housekeeping to take care of.

Are we ready?

Now? No. Okay.


[ Coughing ]

[ Laughter ]

Kevin Calhoun: Okay, thank you.

That impromptu performance was rehearsed endlessly for weeks at a time.

[Laughter] Until it was perfected in San Francisco, California, I'll skip that.

So, control over time.

Seek the time is the method that you use to move in time within the time range of an AVPlayerItem.

You should note that seek to time is not necessarily precise.

One of the things that you'll note in the design of this framework is that we place a very high value on responsiveness to the end user.

So, seeking to time can be an extremely expensive operation if you wish it to be for it to be precise.

It can require to go to any specific time the decoding of an arbitrary long sequence of dependent video frames.

You don't necessarily need that work to be done arriving at a time nearer to the time you wish is usually sufficient and that's a behavior seek to time.

It will give you good responsiveness and good enough results for typical playback scenarios.

However, if you need more precise control over time as you seek around in an AVPlayerItem, you can use the variant method seek to time tolerance before, tolerance after and these tolerances allow you to essentially to define the time range within which you'll be satisfied for the time to arrive at when they seek operation is complete.

You can set these tolerances to zero to arrive at precisely the time that you desire, but you should note as I mentioned just earlier this operation can be expensive and, in fact, it can be detrimental to the responsiveness of your application to the end user so use with caution.

Where does the media come from?

The question I know we all ask each other.

You can play file-based assets from the camera roll as I mentioned earlier.

How do you do that?

The framework that you want to become familiar with in order to play video that you can shoot with the camera on the device is the AssetsLibrary.framework.

That framework has facilities that allow you to survey the groups of assets that are available, essentially the camera rolls that have been recorded, and within each group it allows you to enumerate the assets that are present.

You can filter them by type such as just showing you the videos, but once you have a specific instance of ALAsset from the AssetsLibrary.framework that you wish to play, you obtain the URL from that ALAsset by asking it we missed this particular code point up here so I'll tell you what it is.

Use the method ALAsset default representation to get us to fault representation and ask of its default representation for its URL.

Once you have that URL in hand, you can initialize an AVURLAsset and proceed with the file-based playback workflow as described earlier.

Similarly, if you wish to play media from the iPod library, the MediaPlayer framework has the facilities for you to allow you to query for any particular piece of media of interest.

Essentially you create an instance of MPMediaQuery and resolve it against the MediaLibrary and it will give you one or more MPMediaItems that satisfy your query.

Once you have the MPMediaItem in hand that you wish to play via AVPlayer, you obtain its URL by requesting its MPMediaItemPropertyAssetURL.

That's the URL that you would use to initialize an AVURLAsset and then you can proceed again with the file-based playback workflow.

So, now you have a source of media you know how to set up playback and get it going, even the seek around and time, how do you keep your app in sync with playback as it proceeds?

Well, let's talk about the things that you can observe and respond to while playback occurs.

First of all you can track presentation state.

A prime example here, for example, is the rate of playback.

As it changes, you can use key-value observation in order for your app to respond to changes to properties of playback both in AVPlayer and in AVPlayerItem.

So key-value observing is your friend.

Register for the observation of these keys and you'll be able to discover not only when changes occur that you as the application initiates perhaps in response to user input, but also and equally important changes that are initiated underneath you by the framework or by the system.

Well, what kinds of changes can occur in playback not initiated by you the application in response to your user?

For example, if you are playing visual media and the user multitasks, which is out of your app, you'll observe a change in the rate because that playback item, that AVPlayerItem will automatically be paused and you would observe that by key-value observation of the rate key.

Also, if you're playing remote media, you can observe changes in the AVPlayerItemsProperties loaded time ranges and seekable time ranges, which will tell you what portions of the timeline of the AVPlayerItem are currently available.

Similarly if you are playing a HTTP live stream that has alternate encodings, higher data rates for greater network bandwidth, lower data rates for conditions that are not quite as promising, you can actually observe when the HTTP live stream changes from one of the alternate encodings to another by observing the tracks key of AVPlayerItem and as a switch occurs, you can see, ah-ha, now this AVPlayerItem is playing this encoding.

One last thing you'll want to observe on AVPlayer and AVPlayerItem already recommended as you set things up you wish to be observing their status keys.

Are they ready to play?

That's the first thing you'll need to know before you kickoff playback, but it's possible for these objects to arrive at other interesting statuses as well, in particular, if you or your neighbor or the guy three rows behind you writes one of those applications that requests information synchronously of an AVAsset, that takes longer than the time out value and his app is killed and media services are reset for everyone on the system after we have a moment to hang our heads in shame, the first thing that you have to realize is that every other client of media services on the system will be responsible for setting up their media operations all over again.

How do you know that you have to do this?

If someone has caused this calamity to occur, you observe the status key on AVPlayer and AVPlayerItem.

The status can change to a failure status and the error key of either of those classes can report that media services were reset via the error code of the NSError that's available from the error key.

When you see that that has occurred, you know oh, no, someone has done the wrong thing, media services have been reset, I need to create new instances of my playback objects and so forth, put them into the state that I was in before and then I can proceed.

Now, if you are observing changes that you initiate and you are also observing changes that the framework initiates underneath you, well, there needs to be some way of serializing your registration and unregistration of interest and notification with notifications in flight.

We don't wish for you to have to deal with any possible race conditions if you're trying to disengage from key-value observation while there's a notification about to arrive.

So, in order to avoid that, our recommendation for iOS 4, for your key-value observation of AVPlayer and the other player related classes, is to register for a key-value observation and unregister from it on the main thread and that will guarantee a sensible serialization of registration and unregistration with notifications in flight.

Now, note that we still very highly value responsiveness to the end user.

We're not actually going to perform any of the work associated with these state changes on the main thread, it's only the notifications of those changes that we deliver on that thread and that we recommend that you register and unregister on.

You can also track the readiness for visual display of a visual item.

Perhaps you have an AVPlayer playing audio visual content, and you want to know when the AVPlayer layer that you have setup to display the visual output of the player is ready for display in your layer tree.

You can observe the ready for display key on AVPlayerLayer and when that becomes yes, you know that you can insert the AVPlayerLayer into your layer tree and it has something ready to draw.

This is particularly useful if you want to code a Core Animation transition from the tree in a state in which it lacks the AVPlayerLayer to display the visual output to one that does, and you can do quite a few interesting effects with this.

You can also track the progression of time.

Now because time progresses incrementally during playback, it's not something that you can observe by a key value observing.

It doesn't work for that model.

So, we have a different model available to you for tracking the progression of time.

AVPlayer offers you the option of creating one or more periodic time observers and you get your hands on one of these by using the AVPlayerMethodAddPeriodicTimeObserverForInterval, you supply an interval at which you want to be invoked as time progresses and the block that you supply to this method will be called at that interval and that should allow you, for example, to keep a UI that's tracking the current time in sync with playback.

The block will also be called when time jumps also when playback starts or stops.

So, you'll have full information, full disclosure about the progression of time if it's moving smoothly or if it jumps.

Also if you're trying to perform some manual programmatic synchronization of something going on in your app with playback, we offer a boundary time observer.

You create one of these on AVPlayer, give it a list of times of interest, an array of CMTimes stored in NS Values, and when any one of those times is traversed, when it's crossed during playback, the block that you supply will be evoked and you can respond appropriately according to the current time that has been reached at that point.

Now note that if you do a lot of expensive operations in one of these blocks in response to either a periodic or a boundary time observer, we don't guarantee delivery of all of the invocations of the block.

It's up to you, of course, to code your apps to fit in the operations that you perform in connection with playback so that you don't swamp the CPU; that you can co-exist with the Time Media operation that's going on at the same time.

Finally, the last thing that you can track as playback progresses is when an AVPlayerItem reaches its end time and stops.

We offer a good old NS notification for that purpose known as AVPlayerItemDidPlayToEndTimeNotification.

You can listen for this, it will fire when the AVPlayerItem has played all the way through to the end.

All right so that's basically the playback classes.

What you need to do in order to initiate playback, how you can get media from the various sources available to you in order to play it and how you can keep in sync with playback as it occurs.

A couple of best practices to cover before we move on.

How do you become a good citizen of the platform now that you are taking control over Time Media?

Well, use the AVAsynchronousKeyValueLoadingProtocol described earlier, that's number one.

Also, tell the audio subsystem the type of audio processing that you're performing.

To do that use AVAudioSession in AVFoundation, set the category of audio processing that you are performing.

If you're playing, tell it that your category is playback.

This allows the audio subsystem on the device to arbitrate resources, audio-related resources properly for the various applications that are trying to make use of them.

There's more that you can do with AVAudioSession.

For example, you can use it to become aware of interruptions that may arise during playback or other Time Media operations.

More details about AVAudioSession are available in the core audio sessions that I'll point to you in just a few minutes.

Special multi-tasking note.

You are already aware from other sessions that you can create your application and register it to get processing time in the background and even to play audio in the background.

What I need to make clear to you is the specific behavior that occurs when you're playing visual media and the user switches you from the foreground to the background.

When that happens with no intervention necessary on your part, the playback of a visual item will automatically be paused and its display in the layer tree will automatically be disengaged and you need do nothing in order to accomplish this.

That's they standard user interface for what should happen when visual items are playing on the platform and the app is switched to the background.

Now, if you setup your application to get processing time in the background and play audio, you can, if it's appropriate for your content and the workflow of your app, to continue playing the audio portion of the AVPlayerItem in the background.

One other note about multi-tasking referring back to the earlier functionality of inspection, the loading the values of an asset in order to inspect information about it, any loading that you have initiated will continue to progress even if your app is in the background and if the loading has completed while your app is in the background and you've set up your app to get processing time in the background, you will be notified at the completion of that loading while you're still in the background so you can set yourself up on the basis of what an asset contains even while you await return to the foreground.

If your application is not setup to get processing time in the background, then you'll be notified of the completion of any loading that has occurred in the interim when you return to the foreground.

Okay. Let's review what we've covered during these 55 or so minutes so far.

I've given you an overview of the AVFoundation framework in iOS 4; told you the ways in which we've expanded it; and covered the main areas of functionality that it offers.

I have also given you the flavor of the API and what you need to do in your apps, to as I said earlier, code a little forbearance into your applications because the processes that you apply to Time Media will take time.

You want your apps to remain responsive and good citizens of the platform.

I've told you in detail how to use the inspection-related classes; AVAsset, AVAssetTrack, and how to use the playback-related classes AVPlayer, et cetera.

Remember that an AVFoundation because most of the operations that occur occur asynchronously together with other things going on in the platform we're making full use of programming paradigms that permit this level of asynchronicity, this level of cooperation.

In particular, we're extending key-value coding with something we're calling asynchronous key-value loading in AVFoundation so that you can request specifically the information you require and the framework can tailor the work that it does on your behalf specifically to those things that you need, notify you of when the information is available and allow you to proceed on with your tasks all without reducing responsiveness to the end user.

Remember, even simple questions can take time to answer.

We're using other very typical paradigms available in objective C 2.0, our classes have declared properties, you call their getters after checking whether the information is available in order to obtain values of interest to you, you use key value observing to note changes that occur in state, you use blocks as call backs.

Lots of information about blocks available at the conference.

I'll give you a couple of other sessions that you can go to to learn more about block-base programming.

In the context of AVFoundation, a block is simply a piece of code that you wish to have evoked when a certain operation is complete or a certain state is reached and the block is evoked at some time later when that occurs or perhaps in line if that state has already been reached, and then that block is the code that you would supply that tells you what to do as a result of the operation that you have completed.

So, where else can you go for more information?

Eryk Vershen is your Evangelist, your Media Technologies Evangelist you can contact.

You heard him talk about HTTP live streaming and know that he's well informed on these topics.

Documentation for the AVFoundation Framework is available together with the other iPhone documentation for iOS 4.

And, as usual, the Apple Developer Forums are excellent sources of information and contacts.

Other sessions for you to attend not just the sessions that follow this one about editing and use of the camera.

Also I mentioned a core audio session.

Let's see tomorrow in Mission 10:15AM, Fundamentals of Digital Audio.

There are other audio-related sessions that you may wish to attend to learn more about audio processing on the platform.

There will be a repeat of this session hopefully delivered with expanded lung power on Thursday.

If you have a colleague who wished to attend this session but needed to go to one of the others, you can tell him exactly what will be said there and, in fact, you can let him in on all the jokes as well.

I'll come up with new ones by then.

A couple of sessions about block-based programming are available to you as well to learn more about structuring your application around the use of blocks.

Very powerful technology there.

I recommend that you become familiar with it.

So, to summarize, Time Media takes time.

Initialization of an object does not guarantee suitability or fitness for any particular purpose.

Observe the status of those things that you're interested in and as the status becomes ready, then you can move forward with the operation that you wish to undertake.

So by following these best practices, your apps can stay current and have the full media processing power of the platform available to them.

All of the things you've seen in the demos in the last day and some including an iMovie and other applications you've seen demoed, you can do in your applications as well with these APIs.

Make sure your apps stay responsive by using the asynchronous facilities.

Stay in time by tracking time as it progresses and above all make sure your app stays alive, don't let the watch dog get you.

And with that please stay tuned for the remainder of the AV's Foundation sessions later this afternoon.

Thank you very much.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US