Introducing SiriKit Media Intents

Session 207 WWDC 2019

iOS 13 enhances SiriKit by bringing all new support for audio content playback. See how to provide an excellent, hands-free experience for playing your music, audiobooks, podcasts, radio, and more. Dive into best practices for handling search terms, discover how to provide a complete experience with playback speeds, adding to playlists, and allowing customers to tell you if they like or dislike content.

[ Music ]

[ Applause ]

Hi. I'm Danny Mandel.

And welcome to Introducing SiriKit Media Intents.

We've added media domain support to SiriKit for audio use cases and we're super excited to tell you all about it.

So what are we going to cover in this session?

The first thing we'll do is introduce the new SiriKit Media Intents and talk about their capabilities.

Then we'll talk about what's required for you to handle SiriKit Media requests on your app.

And finally, we'll talk about some best practices you're going to want to follow to provide the best user experience possible when you add SiriKit Media Support to your app.

This year, we're allowing you to control audio in a whole new way with SiriKit Media Intents.

And we think people are going to love using the Siri Media capabilities you build into your apps.

With SiriKit Media Intents, people will be able to do things like play audio, update taste profiles, add to collections, and search.

This means that they'll be able to use the rich natural language processing capabilities of Siri to say things like "Play Khalid on my app," to immediately begin playing Khalid in your app.

Let's take a look at the new SiriKit Media Intents and their capabilities.

There are four SiriKit Media Intents.

The first intent is INPlayMediaIntent.

INPlayMediaIntent allows people to play audio by saying something like "Play Outer Peace in my app."

Now, you might remember that we launched Media Playback Support in IOS 12 with Shortcuts.

And this year, we're adding SiriKit features to INPlayMediaIntent.

The next intent is INAddMediaIntent.

INAddMediaIntent allows people to add items to their playlists and libraries.

An example of this could be "add this song to my road trip playlist in my app."

We have INUpdateMediaAffinityIntent, which allows people to express affinity for media items.

People can say this by saying something as simple as "I like this song."

And finally, we have INSearchForMediaIntent, which lets people search your app for a particular media item.

For example, "Find Billie Eilish in my app."

Let's talk about the supported media types in SiriKit Media.

SiriKit Media supports a number of different audio types, with the first one being music.

And music support is going to let you say things like "Play the song Awesome Song in my app."

In addition to songs, we have support for albums, artists, playlists, genres, and many more.

So you're going to want to check out the documentation for INMediaSearch to get the full list of supported search terms.

And we want you to adopt as many of them as possible to provide the best Siri user experience in your app.

Additionally, we have playback controls, like shuffle, repeat, and playback queues.

And this lets people say things like "Play Khalid shuffle in my app," or, "Play Outer Peace next in my app."

The next supported audio type is Podcasts.

People can begin playing podcasts by saying something like "Put on the Stuff You Should Know podcast for my app."

Additionally, people can also control the playback order and speeds of podcast episodes.

This lets people say something like "Play the newest episode of Stuff You Should Know podcast in my app," or, "Play the Stuff You Should Know podcast in my app at double speed."

Moving on, we have audiobook support.

This lets people say things like "Play the audiobook Becoming in my app."

And like Podcasts, people can begin to playback at a specific speed when asking to play audiobooks.

And finally, we have radio support.

Radio support allows people to ask for a specific radio station in your Radio Playback app.

For example, "Play 89.1 FM in my app."

And don't worry if your app doesn't fall into one of the previous media types, you can still adopt the SiriKit Media Intent and get the full power.

People will be able to say things like "Play search term in my app," and you'll be able to look up search term in your app and play it.

The only thing that will be missing will be support for strongly parsed media types.

So say you had a nature sounds app and you said "Play reptile sounds in my app," or, "Play mammal sounds in my app," Siri is not going to know that those are two different types of animal sounds.

So you'd get a string of mammal sounds or reptile sounds, and you could look it up and play it.

So not quite as structured as the other types but still supported.

Let's look at how we handle these intents in SiriKit.

So the first thing to know about how to request with SiriKit Media is that SiriKit Media Intents are just like any other SiriKit domain.

So all the intent-handling happens in your Intents app extension, where you conform to the SiriKit Media Intent handling protocols.

The details of SiriKit request handling have been covered really well in previous WWDC talks, so I'd refer you to those talks and to the developer documentation online for more general details about SiriKit Request Processing.

Now let's look at what happens for a typical request in the SiriKit Media domain.

The request processing begins when someone says "Play cool song in my app."

And Siri is going to recognize that this is a request for your app, it's going to launch your Intents Extension.

Now, there are three steps in SiriKit Request Processing: resolve, confirm, and handle.

The first step in request processing is the resolve step.

In the media domain, the resolve step is where we take the intents' INMediaSearch object and we run a search against our app catalog.

The output of resolve is one or more concrete media item objects to play.

Alternatively, if we didn't find anything that matched or another error occurred, we can return an unsupported result, which will tell Siri to display the appropriate error dialog.

The next step in Request Processing is the confirm step.

And typically, we discourage use of the confirm step in the Media domain.

In looking at usage in our own apps, we find that using confirm actually lowers the likelihood that people will continue on to play media.

So we don't recommend using the confirm step in the Media domain.

The final step in Request Processing is the handle step.

Now, for INPlayMediaIntent, this ends up being really simple, because we're going to return the response code "handle an app," which is going to do a background app launch.

And inside of our background app launch, we're just going to play media like we normally do in our app.

The only tricky part here is testing.

You're going to want to make sure everything plays because there's not going to be any UI on screen.

You're also going to want to make sure that you test in a variety of situations.

For example, in CarPlay or when you're wearing headphones.

So now that we've seen an overview of SiriKit Request Processing, let's take a look at a simplified resolve media items method.

And the first thing to note is that the parameters are going to be slightly different but the same resolve logic is going to be the same for all four media intents.

Resolve's job is to search the app catalog.

And you're going to want to do it the same way for all four intents.

So we'll initialize a result to the unsupported result.

And this is going to tell Siri to say the appropriate error dialogue if we don't find anything in our app catalog.

INMediaSearch is the intent field that contains the details about what the user asks to play.

INMediaSearch represents the universe of all the audio-related queries that Siri supports.

Our job in Resolve is to go from that universe of possibilities to a single item to play.

And in this example, the first thing we're going to do is read the media name off the INMediaSearch.

Then we're going to retrieve a list of items from our app catalog.

And we're going to use the media name property off the media search to compare against the item's name property.

And we'll talk a little bit more about this later on, but this isn't really something you're going to want to do in your shipping app.

But if we had an exact match on the name, we found the item to play, and we're going to create a success result using that item's properties.

Then we'll call a completion handler and move on to handle.

And in this case, as we said before, handle ends up being very short, since all we're going to do is return the handleInApp success response code.

And this is going to start the process of background launching our app.

Now let's take a look at background app launch.

The method that we implement in our app delegate to support background app launch is application handling intent completion handler.

Again, this is a pretty short implementation.

We're going to read the first media item to play out of the intent and then we're going to just play it in our app the way we normally do.

And finally, we'll call the completion handler with our success response code.

So now that we've seen the new intents and how they all fit together, let's hand it over to Ryan Klems, who's going to show us how this all works in a real app.

[ Applause ]

Thanks, Danny.

Adding SiriKit Media Intent handling to your existing media application is easy.

Here we have our music application and all we need to do to add Siri support is to add the Siri extension target, add a few methods, and then we'll up and handling Siri requests in no time.

To add to the intent's extension, all we have to do is go to file, new, target, select the intents extension and click next.

Give it a name and then click finish.

It will go ahead and create our intent handler for us.

We'll want to go ahead and add the Siri capability to our application.

And then we'll come over to our control extension and we will add our intents that we support, and in this example, we'll just go ahead and support the INPlayMediaIntent and the INAddMediaIntent.

We'll go ahead and select the music type.

We have a few methods here that we want to add to make sure that we build in our extension.

And we'll want to make sure to turn on our proper code signing.

Now we'll come over to our intent handler and all we need to do is add support for the INPlayMediaIntent handling protocol.

And then we'll drop in some stubs for our resolve and handle methods.

For this beginning example, we're just going to return the unsupported result from resolve media items, which will cause Siri to speak to the fact that it couldn't find the item.

So we'll go ahead and try that out.

This is what that would've looked like.

So Siri speaks to the fact that it couldn't find the item due to the fact that we returned the unsupported from the intent handler.

So what we'll do now is we'll go ahead and hook this up to our existing search implementation.

And in this case, the first thing that we're going to want to do is determine what the user is searching for.

So for this simple example, we'll just search for an artist.

And in the method, we'll go ahead and resolve the media item.

And once we resolve the media item, then we will return to our handle method and we're just going to return the handleInApp method response code, which will cause us to background launch the application.

So in order to do that, we'll switch over to our app delegate and we need to add the handle intent method.

And this will just extract the INMediaItem that we passed, that we resolved, in the previous step, and pass that to playback.

And so we'll go ahead and play what that would look like.

So you can see, we return the INMediaItem and it has resolved that, handed it back to the application, and begun playback.

So now that we've done that, why don't we go ahead and add support for the add method.

So to do that, we'll just extend this by adding the INAddMediaIntent handling protocol.

And then we'll add our methods for resolving and handling for the add method.

So you notice here that the resolve media items for add looks pretty much identical to the resolve method for play.

Additionally, now for ad, we'll also have a resolve media destination.

And this is where we're going to determine whether the user's trying to add to the library or to a playlist.

And in the case of a playlist, you might want to do something like "Return playlist name not found," if the playlist that the user specified was not present in their library.

And also, what's different in the ad is there's no reason to go back to the application like to begin playback like we do in the Play Media Intent.

So in this case what we would do is we would actually handle everything in the extension itself.

So we have the resolved media item, and we'll just go ahead and use our applications methods to add to the library or to the playlist.

And in this case, we're just going to use the media player utilities to add to it.

So let's go ahead and take a look at what that would look like.

So it speaks to the item that is added to the playlist as well as the playlist name, because those are specified in the request.

So as you can see, it's relatively easy to add support to your application.

And we really look forward to seeing what you do in your application.

Thank you.

[ Applause ]

Danny, back to you.

Thanks, Ryan.

So what did Ryan show us?

First, he showed us how to add our intents extension to our app.

Then he showed us how to specify our supported intents and supported media types.

And finally, he showed us how to implement resolve and handle for INPlayMediaIntent and INAddMediaIntent so we could immediately begin playing and adding media using Siri.

So let's look at some best practices you're going to want to follow when you adopt SiriKit Media.

We have great news if you've already implemented shortcut support for media playback.

SiriKit Media uses the same code and handle and for background app launch.

While Shortcuts operates on previously donated intents which don't require intent resolution, SiriKit does require a resolve step.

So the two things you're going to need to add are your resolve method and your intent handler, and you're going to need to update your intents extension supported media types in Xcode so Siri knows what content types your app supports.

The extensions handle method should be the same between both implementations and the background app launch to the app delegate's handle intent can be the same, as long as you use the same identifiers for your media items.

So let's see what this looks like.

Here's our Shortcuts implementation from last year.

You'll notice that there's no resolve method, but the rest of it's the same.

So to go from Shortcuts to SiriKit, we just add in our resolve method and we're good to go.

Now let's talk about what you need to do to bring SiriKit Media Support to the Apple Watch.

On watchOS, apps launch in the foreground.

And the way you do that is by returning INPlayMediaIntent response code continueInApp from your handle method in your intent's extension.

This code is going to do a foreground app launch and forward the intent to your WKExtension delegate in your app.

You'll note that the apps handle method looks pretty similar to the one in iOS.

The method signature is slightly different, but you're going to want to read the intent off the NSUser activities interaction property.

And then, just like on iOS, you read the media item to play and start playback in your app.

One word of caution.

You're going to want to use the on-device cache in your resolve method on watchOS.

Only go over the network if it's absolutely necessary.

So we know that when someone says "Play Awesome Song in my app," the first step in Request Processing is to resolve the media items to play.

And we looked at a previous implementation where we checked the value of the media item's name against the intent's media name property.

And it was an exact match.

So what are some edge cases that we didn't cover in that implementation?

The first place that our previous method won't do the right thing is if we have a mismatch on either case or punctuation.

So in this example, "Play hello in my app," we can see a few cases where exact string comparison will fail.

The exact song title is uppercase HELLO with an exclamation point.

But it's possible that the Siri speech engine could give us lowercase hello.

Or maybe it gives us uppercase HELLO without the exclamation point.

So it's really important that we ignore case and punctuation in our resolve method.

Similarly, a lot of music entities have things in the title that people either won't say or they're going to say in a way that doesn't exactly match with the item's title.

For instance, a lot of albums come in a deluxe edition variant.

And people aren't going to say this when they ask to play the album.

They aren't going to say "Play the album Outer Peace deluxe edition in my app," they'll say "Play the album Outer Peace in my app."

And soundtracks are another example.

People aren't going to say they'll say "Play the Rocket Man soundtrack in my app."

And they aren't going to say "Music from the motion picture."

And finally, a lot of hip-hop songs have this featuring abbreviation in their title.

So people either won't say it or they'll say the word "featuring."

So exact match isn't going to work here either.

And podcasts also have some cases where there is a mismatch between what people say and what entity titles are.

So some podcasts have the word "podcast" in their title.

So if someone said "Play the Stuff You Should Know podcast in my app," Siri could parse it as stuff you should know in a media item type of podcast.

So an exact match isn't going to work here either.

And additionally, some podcasts come in audio or video variants, and that audio or video word appears in the title.

But SiriKit Media implies the audio variety, so people aren't going to say that either.

Finally, remember that you're working with a speech recognizer, and the speech recognizer can come with word formatting variations.

So if someone asks to play the song 81st, you can get the number 81 st or you can get the hyphenated eighty-first.

Or if someone asks to play the song I Love You Son, you could get sun or son.

Now, Siri is going to do the best job it can to give you the entity titles for the things that it knows about.

But it's better for you to be flexible in your resolve method.

When you implement your SiriKit Media Support, you control what Siri says by the INMediaItem objects you return from your resolve method.

As you can see here, the user asks to play the song Maybe Sometime by Special Disaster team in Control Audio.

And Siri said, "Here's Maybe Sometime by Special Disaster Team from Control Audio."

In this case the returned INMediaItem had a title property of Maybe Sometime and an artist property of Special Disaster team.

So make sure you always populate title, artist, and type in the returned media items, as these can all influence what Siri says.

And one thing to note, if you return more than one item in your resolve method, Siri is going to speak to the first item in the list.

It's super important that you handle error cases appropriately in SiriKit Media.

When you're interacting with an intelligent assistant like Siri, it can be unclear why something happened when it happened.

And handling error cases appropriately is going to allow you to give the user the best idea of what happened when something goes wrong.

So the most common case you're going to run into is when you don't find something in your app catalog that someone's asked for.

And you handle this case by just returning the unsupported resolution result from resolve media items.

But there's a lot of other errors that could happen.

Maybe someone asks to play something that requires cellular data but they have the cellular data switch turned off for your app.

Or maybe they ask to play something that requires a subscription and they don't have one.

The full list is in INPlayMediaMediaItem UnsupportedReason.

And there's generally similar naming for all four intents, so make sure you adopt them for all the intents you support.

Now let's talk about some of the variety of things that people say to Siri and how you can do a good job handling of them in your SiriKit Media implementation.

So one of the most popular things that people say to Siri is "Play my app."

They don't tell you exactly what it is that they want to play, and it's your job as a SiriKit Media developer to choose the right thing to do for your app.

Now, this might be something as simple as resuming an existing queue.

If you're not an audiobooks or podcasting app, this is probably the most reasonable behavior to implement.

But you can make the behavior as dynamic as you like.

Maybe you want to direct them to a recommended playlist or some hot new trending music.

The choice is yours.

And how do you know that someone said, "Play my app," well, there will be no search criteria specified in the INMediaSearch object.

One thing that might seem like a good idea is to ask someone what they want to play.

But for the same reason that we don't recommend using confirm, we don't recommend this approach.

Putting dialogue prompts in front of people is one of the most common ways that they'll quit the SiriKit Media experience.

People can ask to initiate playback with additional controls, and some of the supported options are repeat, shuffle, resume, and playback queue location.

And if you support these in your app, make sure you support them in your INPlayMediaIntent implementation as well.

And people can also ask to play content with a variety of search options.

And one of the most useful search options is the sort parameter.

You can say things like "Play the new Stuff You Should Know podcast in my app," and you'll get INMediaSortOrder newest.

Or you can ask for the best album by an artist, and you'll get INMediaSortOrder best.

Or you can ask your app for a recommendation, and you'll get INMediaSortOrder recommended.

Check out the full list in INMediaSortOrder.

And another powerful search option is the reference property, which can have INMediaReferenceCurrently Playing.

This is really useful for INAddMediaIntent or INUpdateMediaIntent, because people can easily add the currently playing item to a library or a playlist, or they can tell your app that they love or they hate their currently playing item.

And one thing to note too is if you've populated the external content identifier in MPNowPlayingInfo center, that identifier is going to be in INMediaSearch's identifier property, so you know exactly what item to find.

Now, telling Siri more about how your customer uses your app is going to help Siri provide a wonderful SiriKit Media experience.

So when you give user vocabulary to Siri, this helps Siri understand the parts of your catalog that are interesting to your customer.

It's important to note it's not the entire contents of your app catalog, it's only the pieces of content that your customer is specifically interested in.

It's also important to note that the vocabulary is ordered, so make sure that you include the most important items at the beginning of the collection.

And depending on the type of media your app supports, you're going to send different types to Siri.

Music apps should use playlist title and music artist name.

Audiobook apps should use audiobook title and audiobook author.

And podcasting apps should use show title.

And for those things that are applicable to everybody that uses your app, check out the global vocabulary support in SiriKit.

So in conclusion, we're launching new SiriKit Media Intents to allow you to use Siri to control your audio apps.

You'll be able to play, add, update taste profiles, and search for media using the new intents.

It's very important that you provide the best experience possible.

And you can do this by embracing search flexibility.

Because people don't say exactly what they want to play.

You can handle errors appropriately, so people know what's happened when something goes wrong.

And you're going to you want to make sure that you construct your INMediaItem objects appropriately so Siri can speak the best dialogue possible.

And finally, make sure you give Siri the important contextual information possible so Siri can make the best choices possible on your customer's behalf.

So come see us at the labs and check out the documentation online.

We think you're going to love building apps with the new SiriKit Media Intents and we can't wait to see what you'll build.

Thank you.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US