Advanced Media for the Web

Session 504 WWDC 2014

With the increasing popularity of media on the web, content providers find themselves confronted by an over-abundance of formats, codecs and technologies competing for their attention. See how to use the latest HTML5 technologies in WebKit and Safari to make it easier to deliver media to your users and explore the performance and user experience tradeoffs you'll need to keep in mind when deciding between building for simplicity and fine-grained control.

[ Pause ]

Hello, and welcome to Advanced Media for the Web.

My name is Jer Noble and I'm an engineer on the Safari Layout and Rendering Team.

And today we're going to talk about what's new with audio and video in Safari and how media integrates with today's modern web.

So this is the <video element, it's the heart and soul of web-based media.

On one hand it's a container for some truly advanced technology, but <video elements are also just another brick in the DOM.

They can participate in layout and rendering.

They can be styled with CSS.

With the <video element, media can now be integrated into the same responsive, dynamic designs being written for the modern web.

Video now helps tell stories rather than being the story itself.

A modern example, such as New York Times' snowfall article, show how you can weave video into a rich storytelling experience.

And video can add emotion and energy to a page, even to something already as exciting as the new Mac Pro.

And it can still take center stage.

But things weren't always this easy.

Let's take a look back at how the <video element became a part of this exciting modern web.

So in 1999 this was how you added video to your website.

The QuickTime plug-in, and at the time the QuickTime plug-in was amazing.

It could decode high bit rate video and it exposed a rich API to JavaScript.

For a while, plug-ins were the only way to add video to your website.

Now fast forward seven years and in 2006 this was how you added video to your website: the QuickTime plug-in, again.

By now the QuickTime plug-in supported the H.264 codec and it delivered even higher quality video, but it was still a plug-in, one which users had to find, download and install.

Well, not you Mac users, of course, but it wasn't something which website authors could depend upon being available across all their entire audience.

In 2007, though, this all changed when the <video tag was introduced.

This was an amazing breakthrough.

No longer did web developers have to depend upon a proprietary plug-in to deliver video in their pages.

Video is now integrated directly into the web layer, and as an HTML5 specification the <video element provided a constant or a consistent experience in API across all browsers and platforms.

Browsers could build on, improve and add video features without waiting for plug-in developers, and this triggered a virtuous cycle of innovation.

In 2009 the <video tag came to mobile browsers in iPhone OS 3, when support for the <video element was added to Safari.

Previously, the primary way users of Safari would interact with a video was by clicking on a link, which would open up the YouTube app, but now video was a first class member of mobile browsers as well.

And today's iOS devices are almost as powerful, if not more powerful than desktop computers sold in 2009.

We've talked a lot about the <video element at past WWDCs.

All the videos are at or on the WWDC app you have on your phone.

In 2010 we covered the basics of adding a <video element to your web page.

In 2011 we talked about how to take that <video element and add CSS and JavaScript to make your own custom media controllers.

And in 2012 we showed you how to play back multiple media elements synchronized with one another, how to do advanced low latency effects with the web audio API and how to take your JavaScript-based controls into full screen with the full screen API.

So what will you learn at WWDC 2014?

You'll learn how we've narrowed the differences between Safari on iOS and Safari on OS X and what that means for your web pages.

You'll learn the best way to stream adaptive media on your websites, how to use less power when playing back video and how to coordinate your media's timeline with elements in your page with a timed metadata API.

But before we get started, let's talk a little bit about plug-ins.

Now how good is the <video element on iOS?

It is so good that whenever I encounter a page on Safari on OS X that insists that I need to use the Flash plug-in to view its content, the first thing I try is to turn the user agent to iPad.

Most of the time it works.

What's that all about?

Well, I know that no one here would deliberately write a page that insisted on using a plug-in when HTML5 video was available.

I'm just going to assume you've updated your iPad sites recently, but please update your desktop site as well.

Plug-ins have a time and a place, but as web standards evolve and browsers improve, those times are getting fewer and the places further between, and that's a good thing.

So speaking of browsers improving, let's talk about how we've narrowed the differences between Safari on iOS and OS X.

We've removed some of the distinctions between the platforms by giving you more control over media loading with the preload attribute on iOS and by allowing the <video element to fully participate in CSS layering.

But first the preload attribute.

The <video element's preload attribute lets page authors control when and how their media's data is loaded.

A preload value of "none" instructs the browser to preload no metadata.

A value of "metadata" asks the browser to only download enough media data to determine the media's width, height, duration et cetera.

And a value of "auto" means begin loading media data sufficient to begin playback.

Now in the early days of iOS, there was a lot of media on the internet which couldn't be played by iOS devices.

So in order to be able to tell users whether the media in the page was playable or not, Safari would download enough media data to check playability.

But in order to keep users' data costs down, it would ignore the preload attribute and behave as if it was set to preload="none".

In 2014, unplayable web media is much less of a problem, so new in iOS 8, Safari will honor two preload values: "metadata", which is the new default, and "none".

Why is this the right thing to do?

Most sites will see no change in behavior, either in the browser or on their server, but even loading just metadata can load up for sites with a lot of <video elements.

So now the preload value of "none" will be honored.

Now it's still true that on iOS loading beyond metadata will still require user interaction, and we still believe that this restriction is in the user's best interest, but it does get rid of one frustrating distinction between Safari on OS X and iOS.

So why is this important?

For <video elements which don't explicitly specify preload of "none", they will begin admitting new events, specifically the "loadedmetadata" event.

Now, during development, we came across a certain site, which had the following on their mobile page.

They had a <video element with default controls enabled that, when it receives the loaded metadata, the event would hide those controls.

And it did so to enforce users watching their pre-roll ads.

So in iOS 7, when the <video element was shown, nothing would happen.

And when they hit "play", loading would progress, the loaded metadata then would fire and the controls would hide.

In iOS 8, as soon as the video was added to the page, loading would begin, the loaded metadata event would fire, and the controls would hide, leaving the users no way of actually playing the video.

Now how could they fix this?

What they shouldn't do is revert to the old behavior by adding preload="none", that just leaves in place the implicit assumption that loadedmetadata means the video has begun playing.

Instead they should listen for the onplay event and hide the controls when that occurs, letting the users play.

So that's new in loading.

Let's talk about layering.

In previous versions of iOS, the <video element was implemented as a UI view, which was placed on top of web content.

New in iOS 8, we have integrated in the <video element directly as a native part of the rendered tree, just as it is on OS X Safari.

And, as a result, the <video element will now fully respect CSS layering rules.

However, there is a caveat: websites which did not exclusively place their video topmost with the CSS z-index property may see some weird behavior.

They could have their video appear below other layers that it didn't appear below before.

Or other layers appearing transparently on top of the video layer could intercept touch events, leaving the users no way of actually playing the video in that case either.

So please be on the lookout for these breaking changes in your websites.

That was platform differences.

Now let's talk about how the best way to add adaptive streaming to your websites.

So today's web devices run the gamut from small battery-powered mobile devices, to desktop computers, to big screen TVs.

And that has led to a movement called responsive web design, whose goal is to provide an optimal viewing experience across a wide variety of devices by tailoring a page to respond to different characteristics of the device on which it's running.

Now most responsive web design concerns itself with the size of the viewport in which the page is shown.

But, for video, other properties of the device are as important.

So, yes, what screen size is available on your device, but also what video resolution can the device decode?

What codecs and profiles does the device support?

And how much bandwidth does the device's internet connection provide?

At its most basic, a <video element points to a single file on a web server.

With only a single file, a page author is left with the unenviable task of picking a single version that will apply to all of their viewers.

So perhaps a desktop device with a fat internet connection should get a high bit rate stream, while a mobile device on wireless or on cellular might need a small, lower bit rate version.

But that same device, plugged in and on Wi-Fi, should get the large bit rate version, too.

None of this is easy with a single file sitting on your server.

Instead, this is a job for HTTP Live Streaming, or HLS.

So HTTP Live Streaming is a mechanism for delivering multiple streams in a single manifest file.

The master playlist, or manifest, describes the characteristics of each substream and the URL where that stream can be accessed.

And the browser picks the appropriate stream based on the characteristics of the current device.

Now Safari on OS X and iOS use the AV Foundation Framework to play HLS streams so you get the same high quality streaming experience as the native apps.

And AV Foundation will seamlessly switch streams when conditions of the network or the device change.

To show you how easy it is to create an HLS playlist with multiple streams of different characteristics, my colleague, Brent Fulgham, will walk you through the process.


Thank you, Jer.

My name is Brent Fuljam, and I'm also an engineer in the WebKit Layout and Rendering Team.

And today I wanted to show you a few examples of HLS and how it might make your life better.

Now I'm sure that, getting up here, the first thing you thought was, "This guy is a skater."

Right? I mean, I love it.

I film it and I had some great video that I wanted to show off that we filmed in Utah.

It's high-fidelity video, beautiful cinematography, if I do say so myself.

Wonderful, wonderful content.

And I wanted to share this with my friends and family, who couldn't be there that day.

So what I wanted to do was put together a website that would show this content.

Let me return to this.

All right, so now I have a single source element playing a video.

This is the content that I showed you in QuickTime Player just a second ago.

And let's take a look at what that would look like for our viewers.

Great, it looks exactly the same as what I did in QuickTime Player, so I'm done, right?

All my friends can look at this and tell me how great I am?

Well, no, it turns out that a number of people were trying to view this with lower resolution devices: iPhones and iPads and things that don't have the full pixel content of a giant display projection system like this.

And it turns out they didn't even watch it because it just took too long to play.

Going back to the original video, we can kind of see why that is, it's about 150 megabytes for eight seconds of video, and that's not going to make many people want to stick around and wait for that.

So what do I do?

Well, the first thing I would want to do is take this original video and create multiple encodings that are targeted or optimized for different devices.

So if I have iPhones and iPads that I want to support, I want video streams that are more suitable or optimized for that.

And so you could do this using a variety of tools.

We have iMovie; we have Final Cut Pro.

If you're doing a lot of these you might want to look into Compressor, which is a great application for doing this.

We all have QuickTime Player installed on our computers, and so let me just show you what we would do here.

In QuickTime Player, we can export the video in a variety of formats.

So we have 1080p, 720p, and we have a set of presets that are already laid out for different types of devices.

And so in QuickTime Player I would have to go to each of these presets individually and output a 1080p version and output an iPhone 3GS version, and so forth.

Now I'm not going to make you wait around while I export these, since that's boring, but what I will show you is the set of video encodings that I wound up with.

And since I was running through this briefly before we did this, I have stuff here that you don't need to see yet.

All right, so I have these video encodings, I've created a bunch of different versions that support the different types of devices that I want to support.

I've got high resolution for broadband users.

I've got lower resolutions for people on cellular.

I've got a broadband Wi-Fi version.

And so now I'm all set.

Once I've uploaded these to the web server, then I'm pretty much ready to go, except I need to make some changes to my webpage.

So if I go back to my web page example, instead of just having this one <video element or this one video <source element that is giving me my high-quality video or my ProRes video, I need to add a version that supports, say, my iPad Air.

So I have a Retina iPad.

I have a source that is a slightly different location, a different file encoding, so in this case I'm driving the broadband media, and I'm using a CSS media selector that limits the clients that are going to receive this video to items that have a 1024 by 768 resolution, like you would have on an iPad, and a device pixel ratio of 2.

And so I say, "Okay, this is my iPad Air and, while I'm at it, I probably want to have something for iPhones 5 and 5S and maybe something for a bunch of older stuff.

And pretty soon we have a pretty large set of sources for us to serve from this web server.

Okay, so if we look at this now, you refresh the page, what does it look like?

Well, it looks exactly the same.

I'm getting the same stream that I had before because I'm still connecting to this with a high-quality, well, a loopback network and I'm showing it on a giant screen with lots of pixels.

I'm still getting what I expected.

So I should be done now, right?

I mean I'm able to deliver the right content to all these different people on different devices.

Time to go home and put my feet up and get the congratulatory e-mails from everyone, I assume, right?

Well, it turns out my brother-in-law was camping the weekend I posted this and had really spotty internet connectivity.

He was, I think, on the Edge network or something.

So I asked him what he thought of it, and he said, "Well, I didn't even bother watching it because it took too long to download and it never made any progress."

And I realized, well, we've dealt with the resolution here, but we haven't talked, at all, about network bandwidth, and that plays a role, as well.

Now as a web developer what would we do in this case?

I could write some kind of network sniffing algorithm to try to figure out how much bandwidth is being used and the download rates and this, but it seems like that'd be really hard to do properly and it would be really easy to get wrong and would have to be maintained.

But what about this HLS technology that Jer just finished telling us about?

In theory that should take care of everything.

Well, it seems like a good solution.

I already have all my encoded video here, so there's really I've already done the hard work of creating the different encodings for the different device types, so now all I need to do is generate the HLS master manifest and information that HLS will use to display this content.

Now to do this we need to use the dynamic duo of Media File Segmenter and Variant Playlist Creator.

And these are fantastic tools that you can download from our website but, as you might imagine from these names, they have a dizzying array of flags and entry points that you have to provide, and so it's very difficult to remember.

We ended up just creating a shell script to do this for us.

And so let me just put this up here and, okay, and then let me give you a minute to write that down?

And then-oh, well, that's probably not a great idea.

How about if you come by our lab later this week, and we'll be happy to give you a copy of this?

All right, so what does this look like when we run it?

Well, what I do is-I'll run this, make an HLS script.

I'll provide us with-I'll feed it the input of the various files that we want to use, and we process each of the files.

And what that ends up looking like is I have this magic index.m3u8, which is the master manifest file, and I have a series of transport streams that have been generated for each of my encodings.

So in this case I have a broadband high-bandwidth rate version of this, and I've done the same-and the script has done the same thing for all the different options.

So what I need to do is upload all of this stuff to my website, so I'd upload the different transport streams in my m3u8 file, and at that point I'm basically done, except for one change that I need to make to my video, I mean, to my website.

What I need to do is get rid of all of this stuff, all of it, and replace it with one line, the line that I gave you a sneak peek of at the beginning.

This is the index.m3u8, this is what we're calling this is our Manifest, our Master Manifest file for HLS, and let's just make this say "best".

All right, and let's see what that looks like.

I bet you can guess.

It looks exactly the same, but now we're streaming this content in an adaptive fashion, where it will change and adapt to the type of devices that are being used and it'll change and adapt to the network conditions.

So if I were to start playing this eight-second video and leave the room, in theory, it would-the speed would drop, it would degrade to a lower bandwidth version.

And if I were to return to an area where I had high bandwidth it could then pick it back up and return to this beautiful, high-resolution imagery.

So I think that-so I hope that this brief demo and this example of how simple it is on your website will show you why we're so excited about this technology and why we hope that you'll try it for your next projects.

Thank you.

Thanks, Brent.

That was great.

So for more information about how to use HTTP Live Streaming and how specifically to encode your videos for all the wide variety of iOS devices, take a look at Tech Note 2224, which specifies all the settings you'd need in Compressor to generate multiple encodings of your video for a variety of Apple devices.

And, also, you can download the Variant Playlist Creator and Media File Segmenter Tools as part of the HTTP Live Streaming toolset from

And we have a live streaming developer page, where you can learn all about HLS in webpages and in native apps.

But new in Safari on OS X is support for a media streaming technology called Media Source Extensions, or MSE.

This is an extension to the HTML5 specification, where a <video element source is replaced by a <mediasource object, which requires the page to completely control loading of media data.

Now MSE is primarily intended for only the largest of video providers, who have large and complicated CVNs and who need to micromanage every aspect of their network stack.

We built support for MSE into Safari, but for most websites we don't actually recommend that you use it.

And let's talk a little bit about why that is.

With great power comes great responsibility-except I remember it being someone else.

Oh, that's right.

The MSE API will accept raw data, demux it, parse it into the samples, decode those samples and cube the samples for display, but that's it.

That's all you get.

For everything else, your website has to do it manually.

The browser will not fetch data for you.

It must be fetched explicitly by your page through XHR.

The browser will not preload metadata for you.

You have to do that yourself to make sure playback buffers don't run dry.

And once you've done these two steps, you will have reproduced basic video playback, but then again the <video element could do that already.

For all of the benefits of streaming media, your page must implement it manually.

So your page must monitor network conditions to make sure that your user's device can keep up with the bit rate that you are serving.

You also have to monitor whether your users are dropping frames if their hardware can't keep up with the media that you are serving.

And when conditions change you have to manually switch streams by pre-fetching and then starting over for a more appropriate stream.

And you have to do all of this without detailed information about the current state of the device whereas, with HLS, it can use its detailed view over the device's current state to make better adaptation decisions.

So, for example, HLS knows what other processes might be running on the device and knows whether the device is on a metered cellular connection or on Wi-Fi.

HLS is aware of the current battery conditions of the device, and it knows about the current memory pressure the system is under.

So writing an MSE player involves re-implementing an entire streaming media stack in JavaScript, whereas HLS has all of this data available and yet writing a player for HLS requires a single line of HTML.

And what's more MSE is only available on OS X.

So, to reach iOS users, you're likely to have to set up an HLS stream anyway.

For almost every conceivable situation, HLS is going to be a better choice for streaming media.

Okay, what about cross-browser support?

HLS is supported across all versions of Safari on iOS and OS X.

MSE is only supported on Safari on OS X.

The Android browser and Android Chrome both support HLS, but not MSE.

IE 11 supports MSE, but not HLS.

Google Chrome supports media source extensions, but apparently its developers are investigating implementing HLS on top of MSE as JavaScript.

And Firefox only supports MSE in its nightly builds, but they are also looking at adding support for HLS through MSE implementation.

So, as you can see, the web hasn't really settled on a single streaming media technology yet.

So, if you take nothing else away, for your Safari users, use HLS.

Okay, that was streaming, now let's talk about power efficiency.

At Apple we care deeply about power.

We make devices with simply amazing battery life.

But it's not just about batteries.

We care about the impact our devices have on the environment as a whole, and that's evident in how much performance we can squeeze out of a single watt of power use.

We've done this through a combination of hardware and software engineering, but the last mile is up to you.

It's easy to do this wrong and drain your users' batteries.

And a user with a dead battery is one that's not using your website.

So today we're going to show you how to minimize the amount of power you use when playing back video in Safari.

So, first, we're going to talk about using fullscreen mode and we're going to talk about how sleep cycles affect battery life.

But, first, fullscreen.

It may sound counterintuitive, but going into fullscreen mode can dramatically reduce the amount of power your system uses as a whole.

Apps which are hidden behind a fullscreen browser window can go into a low-power mode called App Nap, and you can learn more about App Nap specifically at-I believe there's a session on Thursday, something about programming, low-power programming and, anyway, look it up.

It's Thursday at 10:30.

But, in addition, when the system determines that it can composite video without-well, it doesn't have to do compositing to get video on screen, it can go through a low power mode, but to explain how that works we're first going to have to talk about pixel formats.

So every web developer should be familiar with RGB.

The web platform is written in RGB values, where every pixel is broken into a red, green and blue component, and each component is given eight bits of depth.

But video is different, video is decoded into a pixel format called YUV, where Y is a luminance plane, and U and V are two color planes.

The Y plane actually encodes the green and brightness values, and the U and V planes encode the blue, or the red and the blue, respectively.

And typically the Y plane is given twice as much depth as the U and V planes, which is why we call it YUV 422 or other formats, like YUV 411.

All of those describe the ratio of the bit depths between the Y and the U and V planes.

And we give the Y plane more depth because the human visual system is much better at distinguishing between values of green and values of light and dark than it is between values of red and blue.

So if you're a mantis shrimp, then this makes total sense to you-if you're a mantis shrimp who longboards, that is.

But so YUV 422 only requires about 16 bits per pixel to encode, whereas an RGB with an alpha channel requires 32.

And since there's typically less variance in the U and the V planes, they can be compressed much easier than RGB values.

And this is why video prefers to use YUV over RGB.

The side effect, though, of all these decisions is that, since the web platform was written in RGB and video was in YUV, we have to convert from one to the other when we need to draw on top of the video.

That's how this works.

It's called compositing, where layers are drawn together, top to bottom, in order to present the actual webpage your viewers are going to see.

So typically it works like this, you start with coded video frames, you decode those frames into YUV, and then convert them into RGB, draw your web content on top of them and then send them out to the video card to be displayed.

Simple, right?

Now if the system determines that it can display a video frame without having to draw anything on top of it, it can skip all of these format conversion steps and go straight from YUV directly to the video card.

It dramatically reduces the amount of power required to display video.

It does have a few prerequisites, though.

For one you must support the Fullscreen API.

If you have JavaScript custom controls, you should have at least one that uses the requestFullscreen method to bring your controls and your video into fullscreen mode.

Black is the new black.

You should only have a black background visible behind your video.

And no DOM element should be visible on top of your video as well, and this is tricky because elements which have an opacity of zero are still technically visible.

So don't hide your controls with opacity or at least don't only hide them with opacity; use "display:none" as well.

And everything that's not currently being displayed in fullscreen mode won't ever be visible, so you might as well hide it as well.

And we'll show you a quick little snippet of CSS that will hide all of your non-fullscreen elements when your video is in fullscreen mode.

So, first, the Fullscreen API.

Now we've talked about this at a previous session, so for more information about how the Fullscreen API works, check out-I think it's the 2011 video session.

But, really quickly, if you just call this method from like a fullscreen button handler it will toggle back and forth between fullscreen mode.

To hide everything that's not in fullscreen we're going to give you a little snippet of CSS to use.

So, first, for a fullscreen element, all of its ancestors are given a pseudo-class called "full-screen-ancestor".

So this will select every child of a fullscreen ancestor that's not an ancestor itself and is not the fullscreen element itself, and hide it.

So just add this line of CSS to your websites.

None of the objects that are not in fullscreen mode when the rest of your content is will be visible or, if they won't be visible, they won't be in the render tree and wasting CPU cycles in memory.

Okay, so that was compositing.

Now let's talk about how video playback affects your sleep.

Something you should be aware of is the way that media playback affects your user system's sleep cycles.

When Safari plays a video it will conditionally block the display from sleeping using a sleep assertion, and it does this to avoid the annoying behavior of your display going to sleep halfway through an episode of "Orange Is the New Black" or whatever.

Safari will only block this sleep from happening under certain conditions, though.

So, the video must have an audio track and a video track.

It has to be playing and it must not be looping.

If any of these conditions are not met, we won't keep the system from sleeping.

However, this has kind of a dramatic failure mode.

So there was a website we came across.

They were trying to do something very cool with the <video elements.

They used a full-page <video element as the backdrop of their landing page and, in order to do a fancy CSS transition at the end of the video, they didn't use looping.

They had two <video elements that they faded between and, even though their video wasn't entirely silent, it was silent because it had a silent audio track.

So if you loaded this page and you walked away from your computer, you came back in a few hours, it would be completely dead because, to Safari, this looks like the user is just watching a playlist of different videos.

So how could they fix this?

Well, for one they could strip the audio track, the silent audio track, out of their media.

They could also burn the fade effect into the video itself, the video media itself, and use the loop property to loop the video over and over again.

Either one of those would let the display sleep again.

But we have also updated our requirements in Safari.

In addition to the <video element having an audio and video track, not looping, and playing, it must also be visible.

That means it must be in the foreground tab in the visible window and on the current space.

If the <video element is in a background tab or the window is hidden, it will let the system sleep again.

So, with these changes, even if you do the wrong thing, your page will still keep the system from sleeping but only when your page is actually visible.

Okay, that was power efficiency.

Now let's talk about how to use timed metadata to coordinate events in your page.

So what is timed metadata?

Timed metadata is data delivered alongside your video and your audio data where each piece of data has a start time and an end time that's in the media's timeline.

But, Jer, I can hear you asking, that sounds a lot like text tracks, and that's true.

Text tracks are one kind of timed metadata, but metadata isn't limited to text and other text-like things.

You can include arbitrary binary information; you can include geolocation information; you can control, add images, you can include text; you can include anything you'd like in a metadata track and have that be available through the Timed Metadata API.

Now, timed metadata has been available to native apps in API form on iOS and OS X for some time, but new in Safari on iOS and OS X, it's easy to use from JavaScript as well.

It appears in the <video element as a text track, just like the caption tracks we talked about at previous WWDCs.

These tracks will be a kind of metadata, meaning they won't be displayed by the browser.

Instead they'll be available to your script running in the page, and you can use the same Text Track APIs to watch for incoming metadata events as text track cues.

Now TextTrack contains a list of cues, each one has a start time and an end time.

You can add event handlers to these objects that will get fired as the media timeline goes into the cue and leaves.

Now a WebKitDataCue is a subclass of a TextTrackCue and, because this interface is experimental, it has a WebKit prefix.

So this is not in the HTML5 spec yet.

Be prepared for this interface to change.

That said, we're pushing to get these proposed changes into the spec soon.

So each cue will have a type property, which allows you to interpret the value property correctly.

So what does a type look like?

Well, the metadata cue type indicates the source the metadata came from.

So metadata can be found in QuickTime user data atoms; they can be found in QuickTime metadata atoms; it could have been inserted by iTunes; it could be found in the MP4 metadata box; or, finally, metadata can be inserted as ID3 frames directly into your media stream.

So these values allow you to interpret the value property, which looks like this.

Each value has a key and, between the key and the type, you can uniquely identify the meaning of the data value or the data property, which can be a string, an array, a number, an array buffer, any JavaScript type.

And the value may optionally have a locale so you can choose between a variety of available cues for the current time based on the user's current system locale.

So what would you use this for?

An extremely simple example would be displaying the title of a song in a long-playing audio stream.

Another example would be to add entry and exit points to various places in your media stream so you can track where the users, say, watch through an entire ad or skipped over it.

But because you can package any type of binary data you like in an HLS stream or a media file, the possibilities for this API are functionally endless.

And so Brent has an amazing demo showing what kind of awesome things you can do with timed metadata.


Thank you.

So let's get back to longboarding.

I have some more footage here of a really nice day in Utah, and we thought it would be really neat to take advantage of some of the metadata that can be encoded in these videos.

Now our iDevices can already collect a lot of things, such as geolocation data and, with newer devices, we can now collect motion data.

There are a variety of applications that you can use that will encode other types of data that are related to other devices and things that you may work with.

This can all be encoded together and brought along as part of the media.

And so I thought it'd be really interesting to see what we could do with that.

Now, in this video, we had some content that was embedded in the media stream as ID3 tags, so they contain a text entry that's a JSON object or an encoded JSON object.

And I wanted to get a feel for what that looked like, so I put together a page that showed the same longboarding video with the metadata displayed on the side.

And so you can see some of the types of content that would be in here.

Now this content was from a specific use case, so it's going to vary depending on where your media comes from and what kinds of tools are being used to put it together.

But, in this case, we have a speed; we have an ordered list of skaters; we've got a notes field.

So I thought it'd be interesting to see what we could do with that.

Now let me just briefly show you what I did to get this data to display.

Okay, so just like before, we have a video source that in this case is another .m3u8 encoded video.

When the video starts, I've added an onloadstart handler so that, when the stream starts playing, we can do something.

And what we do is, we need to register an event listener for the add track event so that we can know when tracks are being added to the streams.

So the metadata-the video will start playing and then the metadata will be recognized by the system, and it'll fire this event.

When the track has been added, I want to add another event listener for 'cuechange'.

This is the part where WebKit will be firing these cue change events as this metadata is encountered in the playback.

Very important is this, where we set the mode to 'hidden'.

Now the tracks in the system come through in a default state of disabled, which means that you will not get events.

We set it to 'hidden' because we don't really want to see this content, and we don't want WebKit to necessarily do anything with it, but we do want to receive events when these cues are encountered so that we can do something with that.

And so, finally, the meat of this is in the 'cuechange' event handler where, because I know that this data is JSON, I was able to just take the data cue, which is a WebKit data track object that like Jer just told us about, and retrieve the data portion of that and parse it as a JSON object.

Once I did that, obviously, the first thing I did was to take that reconstituted JSON object and immediately re-stringify it.

Why? Because I just wanted to pre-print it for the screen and I didn't want to have to like write anything to do that.

So JSON.stringify will do that for you and then you end up with something that looks a little bit like this.

And that's great, we're getting metadata that's firing a few times a second and we have this information we can do something with, but it's not really that compelling an example.

It wasn't that interesting, so I thought, "What else could we do this?"

Well, we have a speed.

What if we modified it so that I could show a HUD with a speed indicator on it?

That would be kind of cool, we could kind of see how fast people were going, so to do that we need to make a few changes.

First, I'm going to go ahead and get rid of this brief little style sheet I put in here just to kind of get things on the screen, and replace it with a more full-featured style sheet.

And I'm going to modify the 'cuechange' event.

I don't need to re-stringify the JSON object now that I have it.

All I need to do is grab the speed out of the JSON payload and stick it in a <div that I just named "speed", and then I'm going to go ahead and add a <div to hold the HUD for that.

We'll call this "AWEsome-er".

All right, so what would this look like?

So now we have our little HUD on the top of the screen.

And, if I begin video playback, now we have a little speedometer running.

So now we have live overlay in this video playback.

We've got content that's being written by us and added by us, live on the web page, and this is kind of an example of what you can do with this kind of information.

Well, I thought this was kind of fun, but we could probably do more.

There was other content there; there was a notes field that had information that called out things that they might be doing on screen; and we had information about the ordered list of the skaters.

So I thought, "Well, why don't we make a leaderboard that showed kind of who is in what position and maybe call out any tracks or other information like that."

So what we need to do is, in addition, in my 'cuechange' event handler, in addition to the speed indicator, I want to have a method that will show the skaters in order and I want to have a method that will display the tricks that are being done on screen.

I'm not going to go into much detail about how this works except to say that, to show the skaters in order, I basically have a first, second, third, fourth CSS class that I set up in the style sheet.

And I just iterate through that list of ordered skaters and, as the skater names come in, I styled the <div that has the name of that skater in first, second, third or fourth position.

Likewise, I'm going to have some code to show the tricks as they come in, and here I'm just going to add a <div to the page that is styled with the "trick" style.

With a little bit of CSS animation, we should be done.

Then let me go ahead and add some <div's to hold all the skaters.

I've given the <div's the names of the skaters and that makes it easy for setting this all up.

Now let's take a look at what that looks like.

So now I've got a HUD, I've got the skaters in the top, I've freeze framed a few things and grabbed some snapshots to show them off.

Let's go ahead and get it started.

There we go and so now we're getting live playback on top of the video with the skaters.

And let me just point out that these guys look really good, but imagine the skill it takes to be skating backwards filming this, and I think you'll agree that a lot of the magic is happening offscreen.

So here, timed metadata events are firing, we're seeing speed changes, we're seeing tricks being called out as they're moving along, and this is all happening live-I'm not baking this into the video.

I could have done that, but then we wouldn't be able to modify this, change colors and whatever else.

And this is the good part here.

This guy, Alan, he saw something, he just slid away.

Here we are, he's going to notice something.

Danger is spotted.

He's going to bail out.

Poor Fred, ah, he hit the pothole, had to get off.

He's now out of the race.

He's got an X and he falls away.

And our last skater ends, and we get the positions of the skaters.

And so that's just a brief example of what you can do with these kinds of metadata events and a little bit of CSS and JavaScript on top of the live video.

I hope you guys understand why we think this is such an exciting technology, and I can't wait to see what all of you will do with this in the future.

Thank you very much and, back to you, Jer.

Thanks, Brent.

That was some amazing backwards cinematography there.

So let's sum up what you guys have heard today.

Video in Safari on iOS and OS X are now closer in behavior by supporting the preload attribute on iOS and allowing the <video element to fully participate in CSS layering.

You've also seen how to use HLS to add adaptive streaming support to your pages, and you've learned how to improve your user's power efficiency by playing back video in your pages.

And you've also seen how to use timed metadata to coordinate events in your page with your media's timeline.

So, for more information, please contact our Evangelism Team and see the Safari for Developers documentation on

And don't forget about the Apple Developer Forums.

Other sessions you might be interested in: "Harnessing Metadata in Audiovisual Media" later today, "Writing Energy Efficient Code", parts one and two, will happen on Wednesday, and stop by the "Designing Responsive Web Experiences" for more information on responsive web design.

And that's it.

Have a great WWDC.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US