Delivering an Exceptional Audio Experience 

Session 507 WWDC 2016

iOS, macOS, watchOS and tvOS offer a rich set of tools and APIs for recording, processing, and playing back audio in your apps. Learn how to choose the right API for your app, and the details on implementing each of them in order to deliver an outstanding audio experience.

[ Music ]

All right.

[ Applause ]

Good afternoon, everyone.

So, how many of you want to build an app with really cool audio effects but thought that that might be hard?

Or how many of you want to focus more on your application’s overall user experience but ended up spending a little more time on audio?

Well, we’ve been working hard to make it easy for you.

My name is Saleem.

I’m a Craftsman on the Core Audio Team.

I want to welcome you to today’s session on delivering an exceptional audio experience.

So let’s look at an overview of what’s in our stack today.

We’ll start with our AVFoundation framework.

We have a wide variety of high-level APIs that let you simply play and record audio.

For more advanced use cases, we have our AudioToolbox framework.

And you may have heard of AudioUnits.

These are a fundamental building block.

If you have to work with MIDI devices or MIDI data, we have our CoreMIDI framework.

For game development, there’s OpenAL.

And over the last two years, we’ve been adding many new APIs and features as well.

So you can see there are many ways that you can use audio in your application.

So, our goal today is to help guide you to choosing the right API for your application’s needs.

But don’t worry, we also have a few new things to share with you as well.

So, on the agenda today, we’ll first look at some essential setup steps for a few of our platforms.

Then, we’ll dive straight into simple and advanced playback and recording scenarios.

We’ll talk a bit about multichannel audio.

And then later in the presentation, we’ll look at real-time audio how you can build your own effects, instruments, and generators and then we’ll wrap up with MIDI.

So, let’s get started.

iOS, watchOS, and tvOS all have really rich audio features and numerous writing capabilities.

So users can make calls, play music, play games, work with various productivity apps.

And they can do all of this mixed in or independently.

So the operating system manages a lot of default audio behaviors in order to provide a consistent user experience.

So let’s look at a diagram showing how audio is a managed service.

So you have your device, and it has a couple of inputs and outputs.

And then there’s the operating system.

It may be hosting many apps, some of which are using audio.

And lastly, there’s your application.

So AVAudioSession is your interface, as a developer, for expressing your application needs to the system.

Let’s go into a bit more detail about that.

Categories express the application’s highest-level needs.

We have modes and category options which help you further customize and specialize your application.

If you’re into some more advanced use cases, such as input selection, you may want to be able to choose the front microphone on your iPhone instead of the bottom.

If you’re working with multichannel audio and multichannel content on tvOS, you may be interested in things like channel count.

If you had a USB audio device connected to your iPhone, you may be interested in things like sample rate.

So when your application is ready and configured to use audio, it informs the system to apply the session.

So this will configure the device’s hardware for your application’s needs and may actually result in interrupting other audio applications on the system, mixing with them, and/or ducking their volume level.

So let’s look at some of the essential steps when working with AVAudioSession.

The first step is to sign up for notifications.

And the three most important notifications are the interruption, route change, and mediaServicesWereReset notification.

You can sign up for these notifications before you activate your session.

And in a few slides, I’ll show you how you can manage them.

Next, based on your application’s high-level needs, you’ll want to set the appropriate category mode and options.

So, let’s look at a few examples.

Let’s just say I was building a productivity app.

And in that application, I want to play a simple sound when the user saves their document.

Here, we can see that audio enhances the experience but it’s not necessarily required.

So, in this case, I’d want to use the AmbientCategory.

This category obeys the ringer switch.

It does not play audio in the background, and it’ll always mix in with others.

If I was building a podcast app, I’d want to use the PlaybackCategory, the SpokenAudio mode.

And here, we can see that this app location will interrupt other applications on the system.

Now if you want your audio to continue playing in the background, you’ll also have to specify the background audio key in your info.plist.

And this is essentially a session property as well.

It’s just expressed through a different means.

For your navigation app, let’s look at how you can configure the navigation prompt.

Here, you’d want to use the PlaybackCategory, the DefaultMode.

And there are a few options of interest here.

You’d want to use both the InterruptSpokenAudio AndMixWithOthers as well as the duckOthers.

So, if you’re listening to a podcast while navigating and that navigation prompt comes up saying, “Oh, turn left in 500 feet,” it’ll actually interrupt the podcast app.

If you’re listening to music, it’ll duck the music’s volume level and mix in with it.

For this application, you’ll also want to use a background audio key as well.

So, next, let’s look at how we can manage activation of our session.

So what does it mean to go active?

Activating your session informs the system to configure the hardware for your application’s needs.

So let’s say, for example, I had an application whose category was set to PlayAndRecord.

When I active my session, it’ll configure the hardware to use input and output.

Now, what happens if I activate my session while listening to music from the music app?

Here, we can see that the current state of the system is set for playback only.

So, when I activate my session, I inform the system to configure the hardware for both input and output.

And since I’m in a non-mixable app location, I’ve interrupted the music app.

So let’s just say my application makes a quick recording.

Once I’m done, I deactivate my session.

And if I choose to notify others that I’ve deactivated my session, we’ll see that the music app would resume playback.

Next, let’s look at how we can handle the notifications we signed up for.

We’ll first look at the interruption notification, and we’ll examine a case where your application does not have playback UI.

The first thing I do is I get the interruptionType.

And if it’s the beginning of an interruption, your session is already inactive.

So your players have been paused, and you’ll use this time to update any internal state that you have.

When you receive the end interruption, you go ahead and activate your session, start your players, and update your internal state.

Now, let’s see how that differs for an application that has playback UI.

So when you receive the begin interruption again, your session is inactive you update the internal state, as well as your UI this time.

So if you have a Play/Pause button, you’d want to go ahead and set that to “play” at this time.

And now when you receive the end interruption, you should check and see if the shouldResume option was passed in.

If that was passed in, then you can go ahead and activate your session, start playback, and update your internal state and UI.

If it wasn’t passed in, you should wait until the user explicitly resumes playback.

It’s important to note that you can have unmatched interruptions.

So, not every begin interruption is followed by a matching end.

And an example of this are media-player applications that interrupt each other.

Now, let’s look at how we can handle route changes.

Route changes happen for a number of reasons the connected devices may have changed, a category may have changed, you may have selected a different data source or port.

So, the first thing you do is you get the routeChangeReason.

If you receive a reason that the old device is unavailable in your media-playback app, you should go ahead and stop playback at this time.

An example of this is if your user is streaming music to the headsets and they unplug the headsets.

They don’t expect that the music resumes playback through the speakers right away.

For more advanced use cases, if you receive the oldDeviceUnavailable or newDeviceAvailable routeChangeReason, you may want to re-evaluate certain session properties as it applies to your application.

Lastly, let’s look at how we can handle the media services where we set the notification.

This notification is rare, but it does happen because demons aren’t guaranteed to run forever.

The important thing to note here is that your AVAudioSession sharedInstance is still valid.

You will need to reset your category mode and other options.

You’ll also need to destroy and recreate your player objects, such as your AVAudioEngine, remote I/Os, and other player objects as well.

And we provide a means for testing this on devices by going to Settings, Developer, Reset Media Services.

OK, so that just recaps the four steps for working with AVAudioSession the essential steps.

You sign up for notifications.

You set the appropriate category mode and options.

You manage activation of your session.

And you handle the notifications.

So let’s look at some new stuff this year.

New this year, we’re adding two new category options allowAirPlay and allowBluetoothA2DP to the PlayAndRecord category.

So, that means that you can now use a microphone while playing to a Bluetooth and AirPlay destination.

So if this is your application’s use case, go ahead and set the category and the options, and then let the user pick the route from either an MPVolumeView or Control Center.

We’re also adding a new property for VoIP apps on our AVAudioSessionPortDescription that’ll determine whether or not the current route has hardware voice processing enabled.

So if your user is connected to a CarPlay system or a Bluetooth HFP headset that has hardware voice processing, you can use this property to disable your software voice processing so you’re not double-processing the audio.

If you’re already using Apple’s built-in voice processing IO unit, you don’t have to worry about this.

And new this year, we also introduced the CallKit framework.

So, to see how you can enhance your VoIP apps with CallKit, we had a session earlier this week.

And if you missed that, you can go ahead and catch it online.

So that’s just an overview of AVAudioSession.

We’ve covered a lot of this stuff in-depth in previous sessions.

So we encourage you to check those out, as well as a programming guide online.

So, moving on.

So you set up AVAudioSession if it’s applicable to your platform.

Now, let’s look at how you can simply play and record audio in your application.

We’ll start with the AVFoundation framework.

There are a number of classes here that can handle the job.

We have our AVAudioPlayer, AVAudioRecorder, and AVPlayer class.

AVAudioPlayer is the simplest way to play audio from a file.

We support a wide variety of formats.

We provide all the basic playback operations.

We also support some more advanced operations, such as setting volume level.

You get metering on a per-channel basis.

You can loop your playback, adjust the playback rate, work with stereo panning.

If you’re on iOS or tvOS, you can work with channel assignments.

If you had multiple files you wanted to playback, you can use multiple AVAudioPlayer objects and you can synchronize your playback as well.

And new this year, we’re adding a method that lets you fade to volume level over a specified duration.

So let’s look at a code example of how you can use AVAudioPlayer in your application.

Let’s just say I was working and building a simple productivity app again where I want to play an acknowledgement sound when the user saves their document.

In this case, I have an AVAudioPlayer and a URL to my asset in my class.

Now in my setup function, I go ahead and I create the AVAudioPlayer object with the contents of my URL and I prepare the player for playback.

And then, in my saveDocument function, I may do some work to see whether or not the document was saved successfully.

And if it was, then I simply play my file.

Really easy.

Now, let’s look at AVAudioRecorder.

This is the simplest way to record audio to a file.

You can record for a specified duration, or you can record until the user explicitly stops.

You get metering on a per-channel basis, and we support a wide variety of encoded formats.

So, to set up a format, we use the Recorder Settings Dictionary.

And now this is a dictionary of keys that has a list of keys that let you set various format parameters such as sample rate, number of channels.

If you’re working with Linear PCM data, you can adjust things like the bit depth and endian-ness.

If you’re working with encoded formats, you can adjust things such as quality and bit rate.

So, let’s look at a code example of how you can use AVAudioRecorder.

So the first thing I do is I create my format settings.

Here, I’m creating an AAC file with a really high bit rate.

And then the next thing I do I go ahead and create my AVAudioRecorder object with a URL to the file location and the format settings I’ve just defined.

And in this example, I have a simple button that I’m using to toggle the state of the recorder.

So when I press the button, if the recorder is recording, I go ahead and stop recording.

else I start my recording.

And I can use the recorders built in meters to provide feedback to the UI.

Lastly, let’s look at AVPlayer.

AVPlayer works not only with local files but streaming content as well.

You have all the standard control available.

We also provide built-in user interfaces that you can use directly, such as the AVPlayerView and the AVPlayerViewController.

And AVPlayer also works with video content as well.

And this year, we added a number of new features to AVPlayer.

So if you want to find out what we did, you can check out the Advances in AVFoundation Playback.

And if you missed that, you can go ahead and catch it online.

OK, so what we’ve seen so far is just some very simple examples of playback and recording.

So now let’s look at some more advanced use cases.

Advanced use cases include playing back not only from files but working with buffers of audio data as well.

You may be interested in doing some audio processing, applying certain effects and mixing together multiple sources.

Or you may be interested in implementing 3D audio.

So, some examples of this are you’re building a classic karaoke app, you want to build a deejay app with really amazing effects, or you want to build a game and really immerse your user in it.

So, for such advanced use cases, we have a class in AVFoundation called AVAudioEngine.

AVAudioEngine is a powerful, feature-rich Objective-C and Swift API.

It’s a real-time audio system, and it simplifies working with real-time audio by providing a non-real-time interface for you.

So this has a lot of complexities dealing with real-time audio, and it makes your code that much simpler.

The Engine manages a graph of nodes, and these nodes let you play and record audio.

You can connect these nodes in various ways to form many different processing chains and perform mixing.

You can capture audio at any point in the processing chain as well.

And we provide a special node that lets you spatialize your audio.

So, let’s look at the fundamental building block the AVAudioNode.

We have three types of nodes.

We have source nodes, which provide data for rendering.

So these could be your PlayerNode, an InputNode, or a sampler unit.

We have processing nodes that let you process audio data.

So these could be effects such as delays, distortions, and mixers.

And we have the destination node, which is the termination node in your graph, and it’s connected directly to the output hardware.

So let’s look at a sample setup.

Let’s just say I’m building a classic karaoke app.

In this case, I have three source nodes.

I’m using the InputNode to capture the user’s voice.

I’m using a PlayerNode to play my Backing Track.

I’m using another PlayerNode to play other sound effects and feedback used to the user.

In terms of processing nodes, I may want to apply a specific EQ to the user’s voice.

And then I’m going to use the mixer to mix all three sources into a single output.

And then the single output will then be played through the OutputNode and then out to the output hardware.

I can also capture the user’s voice and do some analysis to see how well they’re performing by installing a TapBlock.

And then based on that, I can unconditionally schedule these feedback queues to be played out.

So let’s now look at a sample game setup.

The main node of interest here is the EnvironmentNode, which simulates a 3D space and spatializes its connected sources.

In this example, I’m using the InputNode as well as a PlayerNode as my source.

And you can also adjust various 3D mixing properties on your sources as well, such as position, occlusion.

And in terms of the EnvironmentNode, you can also adjust properties there, such as the listenerPosition as well as other reverb parameters.

So this 3D Space can then be mixed in with a Backing Track and then played through the output.

So before we move any further with AVAudioEngine, I want to look at some fundamental core classes that the Engine uses extensively.

I’ll first start with AVAudioFormat.

So, AVAudioFormat describes the data format in an audio file or stream.

So we have our standard format, common formats, as well as compressed formats.

This class also contains an AVAudioChannelLayout which you may use when dealing with multichannel audio.

It’s a modern interface to our AudioStreamBasicDescription structure and our AudioChannelLayout structure.

Now, let’s look at AVAudioBuffer.

This class has two subclasses.

It has the AVAudioPCMBuffer, which is used to hold PCM data.

And it has the AVAudioCompressBuffer, which is used for holding compressed audio data.

Both of these classes provide a modern interface to our AudioBufferList and our AudioStreamPacketDescription.

Let’s look at AVAudioFile.

This class lets you read and write from any supported format.

It lets you read data into PCM buffers and write data into a file from PCM buffers.

And in doing so, it transparently handles any encoding and decoding.

And it supersedes now our AudioFile and ExtAudioFile APIs.

Lastly, let’s look at AVAudioConverter.

This class handles audio format conversion.

So, you can convert between one form of PCM data to another.

You can also convert between PCM and compressed audio formats in which it handles the encoding and decoding for you.

And this class supersedes our AudioConverter API.

And new this year, we’ve also added a minimum phase sample rate converter algorithm.

So you can see that all these core classes really work together when interfacing with audio data.

Now, let’s look at how these classes then interact with AVAudioEngine.

So if you look at AVAudioNode, it has both input and output AVAudio formats.

If you look at the PlayerNode, it can provide you to the Engine from an AVAudioFile or an AVAudioPCMBuffer.

When you install a NodeTap, the block provides audio data to you in the form of PCM buffers.

You can do analysis with it, or then you can save it to a file using an AVAudio file.

If you’re working with a compressed stream, you can break it down into compress buffers, use an AVAudioConverter to convert it to PCM buffers, and then provide it to the Engine through the PlayerNode.

So, new this year, we’re bringing a subset of AVAudioEngine to the Watch.

Along with that, we’re including a subset of AVAudioSession, as well as all the core classes you’ve just seen.

So I’m sure you’d love to see a demo of this.

So we have that for you.

We built a simple game using both SceneKit and AVAudioEngine directly.

And in this game, what I’m doing is I’m launching an asteroid into space.

And at the bottom of the screen, I have a flame.

And I can control the flame using the Watch’s Digital Crown.

And now if the asteroid makes contact with the flame, it plays this really loud explosion sound.

So, let’s see this.

[ Explosions ]

I’m sure this game, like, defies basic laws of physics because it’s playing audio in space.

Right? And that’s not possible.

[ Applause ]

All right, so let me just go over quickly the AVAudioEngine code in this game.

So, in my class, I have my AVAudioEngine.

And I have two PlayerNodes one for playing the explosion sound, and one for playing the launch sound.

I also have URLs to my audio assets.

And in this example, I’m using buffers to provide data to the engine.

So, let’s look at how we set up the engine.

The first thing I do is I go ahead and I attach my PlayerNodes.

So I touch the explosionPlayer and the launchPlayer.

Next, I’m going to use the core classes.

I’m going to create an AVAudio file from the URL of my assets.

And then, I’m going to create a PCM buffer.

And I’m going to read the data from the file into the PCM buffer.

And I can do this because my audio files are really short.

Next, I’ll go ahead and make the connections between the source nodes and the engine’s main mixer.

So, when the game is about to start, I go ahead and I start my engine and I start my players.

And when I launch an asteroid, I simply schedule the launchBuffer to be played on the launchPlayer.

And when the asteroid makes contact with the flame, I simply schedule the explosionBuffer to be played on the explosionPlayer.

So, with a few lines of code, I’m able to build a really rich audio experience for my games on watchOS.

And that was a simple example, so we can’t wait to see what you come up with.

So, before I wrap up with AVAudioEngine, I want to talk about multichannel audio and specifically how it relates to tvOS.

So, last October, we introduced tvOS along with the 4th generation Apple TV.

And so this is the first time we can talk about it at WWDC.

And one of the interesting things about audio on Apple TV is that many users are already connected to multichannel hardware since many home theater systems already support 5.1 or 7.1 surround sound systems.

So, today, I just want to go over how you can render multichannel audio using AVAudioEngine.

So, first, let’s review the setup with AVAudioSession.

I first set my category and other options, and then I activate my session to configure the hardware for my application’s needs.

Now, depending on the rendering format I want to use, I’ll first need to check and see if the current route supports it.

And I can do that by checking if my desired number of channels are less than or equal to the maximum number of output channels.

And if it is, then I can go ahead and set my preferred number of output channels.

I can then query back the actual number of channels from the session and then use that moving forward.

Optionally, I can look at the array of ChannelDescriptions on the current port.

And each ChannelDescription gives me a channelLabel and a channelNumber.

So I can use this information to figure out the exact format and how I can map my content to the connected hardware.

Now, let’s switch gears and look at the AVAudioEngine setup.

There are two use cases here.

The first use case is if you already have multichannel content.

And the second use case is if you have mono content and you want to spatialize it.

And this is typically geared towards games.

So, in the first use case, I have multichannel content and multichannel hardware.

I simply get the hardware format.

I set that as my connection between my Mixer and my OutputNode.

And on the source side, I get the content format and I set that as my connection between my SourceNode and the Mixer.

And here, the Mixer handles the channel mapping for you.

Now, in the second use case, we have a bunch of mono sources.

And we’ll use the EnvironmentNode to spatialize them.

So, like before, we get the hardware format.

But before we set the compatible format, we have to map it to one that the EnvironmentNode supports.

And for a list of supported formats, you can check our documentation online.

So, I set the compatible format.

And now on the source side, like before, I get the content format and I set that as my connection between my player and the EnvironmentNode.

Lastly, I’ll also have to set the multichannel rendering algorithm to SoundField, which is what the EnvironmentNode currently supports.

And at this point, I can start my engine, start playback, and then adjust all the various 3D mixing properties that we support.

So, just a recap.

AVAudioEngine is a powerful, feature-rich API.

It simplifies working with real-time audio.

It enables you to work with multichannel audio and 3D audio.

And now, you can build games with really rich audio experiences on your Watch.

And it supersedes our AUGraph and OpenAL APIs.

So we’ve talked a bit about the Engine in previous sessions, so we encourage you to check those out if you can.

And at this point, I’d like to hand it over to my colleague, Doug, to keep it rolling from here.


[ Applause ]

Thank you, Saleem.

So, I’d like to continue our tour through the audio APIs here.

We talked about real-time audio in passing with AVAudioEngine.

Saleem emphasized that, while the audio processing is happening in real-time context, we’re controlling it from non-real-time context.

And that’s the essence of its simplicity.

But there are times when you actually want to do work in that real-time process, or context.

So I’d like to go into that a bit.

So, what is real-time audio?

The use cases where we need to do things in real-time are characterized by low latency.

Possibly the oldest example I’m familiar with on our platforms is with music applications.

For example, you may be synthesizing a sound when the user presses a key on the MIDI keyboard.

And we want to minimize the time from when that MIDI note was struck to when the note plays.

And so we have real-time audio effects like guitar pedals.

We want to minimize the time it takes from when the audio input of the guitar comes into the computer through which we process it, apply delays, distortion, and then send it back out to the amplifier.

So we need low latency there so that the instrument, again, is responsive.

Telephony is also characterized by low latency requirements.

We’ve all been on phone calls with people in other countries and had very long delay times.

It’s no good in telephony.

We do a lot of signal processing.

We need to keep the latency down.

Also, in game engines, we like to keep the latency down.

The user is doing things interacting with joysticks, whatever.

We want to produce those sounds as quickly as possible.

Sometimes, we want to manipulate those sounds as they’re being rendered.

Or maybe we just have an existing game engine.

In all these cases, we have a need to write code that runs in a real-time context.

In this real-time context, the main characteristic of our constraint is that we’re operating under deadlines.

Right? Every some-number of milliseconds, the system is waking us up, asking us to produce some audio for that equally-small slice of time.

And we either accomplish it and produce audio seamlessly.

Or if we fail, if we take too long to produce that audio, we create a gap in the output.

And the user hears that as a glitch.

And this is a very small interval that we have to create our audio in.

Our deadlines are typically as small as 3 milliseconds.

And 20 milliseconds, which is default on iOS, is still a pretty constrained deadline.

So, in this environment, we have to be really careful about what we do.

We can’t really block.

We can’t allocate memory.

We can’t use mutexes.

We can’t access the file system or sockets.

We can’t log.

We can’t even call a dispatch “async” because it allocates continuations.

And we have to be careful not to interact with the Objective-C and Swift runtimes because they are not entirely real-time safe.

There are cases when they, too, will take mutexes.

So that’s a partial list.

There other things we can’t do.

The primary thing to ask yourself is, “Does this thing I’m doing allocate memory or use mutexes?”

And if the answer is yes, then it’s not real-time safe.

Well, what can we do?

I’ll show you an example of that in a little bit.

But, first, I’d like to just talk about how we manage this problem of packaging real-time audio components.

And we do this with an API set called Audio Units.

So this is a way for us to package and for you, for that matter, as another developer to package your signal processing and modules that can be reused in other applications.

And it also provides an API to manage the transitions and interactions between your non-real-time context and your real-time rendering context.

So, as an app developer, you can host Audio Units.

That means you can let the user choose one, or you can simply hardcode references to system built-in units.

You can also build your own Audio Units.

You can build them as app extensions or plug-ins.

And you can also simply register an Audio Unit privately to your application.

And this is useful, for example, if you’ve got some small piece of signal processing that you want to use in the context of AVAudioEngine.

So, underneath Audio Units, we have an even more fundamental API which we call Audio Components.

So this is a set of APIs in the AudioToolbox framework.

The framework maintains a registry of all of the components on the system.

Every component has a type, subtype, and manufacturer.

These are 4-character codes.

And those serve as the key for discovering them and registering them.

And there are a number of different kinds of Audio Components types.

The two main categories of types are Audio Units and Audio Codecs.

But amongst the Audio Units, we have input/output units, generators, effects, instruments, converters, mixers as well.

And amongst codecs, we have encoders and decoders.

We also have audio file components on macOS.

Getting into the implementation of components, there are a number of different ways that components are implemented.

Some of them you’ll need to know about if you’re writing them.

And others, it’s just for background.

The most highly-recommended way to create a component now if it’s an Audio Unit is to create an Audio Unit application extension.

We introduced this last year with our 10.11 and 9.0 releases.

So those are app extensions.

Before that, Audio Units were packaged in component bundles as were audio codecs, et cetera.

That goes back to Mac OS 10.1 or so.

Interestingly enough, audio components also include inter-app audio nodes on iOS.

Node applications register themselves with a component subtype and manufacturer key.

And host applications discover node applications through the Audio Component Manager.

And finally, you can register as I mentioned before you can register your own components for the use of your own application.

And just for completeness, there are some Apple built-in components.

On iOS, they’re linked into the AudioToolbox.

So those are the flavors of component implementations.

Now I’d like to focus in on just one kind of component here the audio input/output unit.

This is and Audio Unit.

And it’s probably the one component that you’ll use if you don’t use any other.

And the reason is that this is the preferred interface to the system’s basic audio input/output path.

Now, on macOS, that basic path is in the Core Audio framework.

We call it the Audio HAL, and it’s a pretty low-level interface.

It makes its clients deal with interesting stream typologies on multichannel devices for example.

So, it’s much easier to deal with the Audio HAL interface through an audio input/output unit.

On iOS, you don’t even have access to the Core Audio framework.

It’s not public there.

You have to use an audio input/output unit as your lowest-level way to get audio in and out of the system.

And our preferred interface now for audio input/output units is AUAudioUnit and the AudioToolbox framework.

If you’ve been working with our APIs for a while, you’re familiar with version 2 Audio Units that are part of the system AUHAL on the macOS and AURemoteIO on iOS as well as Watch actually, I’m not sure we have it available there.

But in any case, AUAudioUnit is your new modern interface to this low-level I/O mechanism.

So I’d like to show you what it looks like to use AUAudioUnit to do AudioIO.

So I’ve written a simple program in Swift here that generates a square wave.

And here’s my signal processing.

I mentioned earlier I would show you what kinds of things you can do here.

So this wave generator shows you.

You can basically read memory, write memory, and do math.

And that’s all that’s going on here.

It’s making the simplest of all wave forms the square wave at least simplest from a computational point of view.

So that class is called SquareWaveGenerator.

And let’s see how to play a SqaureWaveGenerator from an AUAudioUnit.

So the first thing we do is create an audio component description.

And this tells us which component to go look for.

The type is output.

The subtype is something I chose here depending on platform either RemoteIO or HalOutput.

We’ve got the Apple manufacturer and some unused flags.

Then I can create my AUAudioUnit using my component description.

So I’ll get that unit that I wanted.

And now it’s open and I can start to configure it.

So the first thing I want to do here is find out how many channels of audio are on the system.

There are ways to do this with AVAudioSession on iOS.

But most simply and portably, you can simply query the outputBusses of the input/output unit.

And outputBus[0] is the output-directed stream.

So I’m going to fetch its format, and that’s my hardware format.

Now this hardware format may be something exotic.

It may be inertly for example.

And I don’t know that I want to deal with that.

So I’m just going to create a renderFormat.

That is a standard format with the same sample rate.

And some number of channels.

Just to keep things short and simple, I’m only going to render two channels, regardless of the hardware channel count.

So that’s my renderFormat.

Now, I can tell the I/O unit, “This is the format I want to give you on inputBus[0].”

So, having done this, the unit will now convert my renderFormat to the hardwareFormat.

And in this case, on my MacBook, it’s going to take this deinterleaved floating point and convert it to interleaved floating point buffers.

OK. So, next, I’m going to create my square wave generators.

If you’re a music and math geek like me, you know that A440 is there, and multiplying it by 1.5 will give you a fifth above it.

So I’m going to render A to my left channel and E to my right channel.

And here’s the code that will run in the real-time context.

There’s a lot of parameters here, and I actually only need a couple of them.

I only need the frameCount and the rawBufferList.

The rawBufferList is a difficult, low-level C structure which I can rewrap in Swift using an overlay on the SDK.

And this takes the audio bufferList and makes it look something like a vector or array.

So having converted the rawBufferList to the nice Swift wrapper, I can query its count.

And if I got at least one buffer, then I can render the left channel.

If I got at least two buffers, I can render the right channel.

And that’s all the work I’m doing right here.

Of course, there’s more work inside the wave generators, but that’s all of the real-time context work.

So, now, I’m all setup.

I’m ready to render.

So I’m going to tell the I/O unit, “Do any allocations you need to do to start rendering.”

Then, I can have it actually start the hardware, run for 3 seconds, and stop.

And that’s the end of this simple program.

[ Monotone ]

So, that’s AUAudioUnit.

I’d like to turn next briefly to some other kinds of Audio Units.

We have effects which take audio input, produce audio output.

Instruments which take something resembling MIDI as input and also produce audio output.

And generators which produce audio output without anything going in except maybe some parametric control.

If I were to repackage my square wave generator as an Audio Unit, I would make it a generator.

So to host these kinds of Audio Units, you can also use AUAudioUnit.

You can use a separate block to provide input to it.

It’s very similar to the output provider block that you saw on the I/O unit.

You can chain together these render blocks of units to create your own custom typologies.

You can control the units using their parameters.

And also, many units, especially third-party units, have nice user interfaces.

As a hosting application, you can obtain that audio unit’s view, display it in your application, and let the user interact with it.

Now if you’d like to write your own Audio Unit, the way I would start is just building it within the context of an app.

This lets you debug without worrying about inter-process communication issues.

It’s all in one process.

So, you start by subclassing AUAudioUnit.

You register it as a component using this class method of AUAudioUnit.

Then, you can debug it.

And once you’ve done that and if you decide you’d like to distribute it as an Audio Unit extension you can take that same AUAudioUnit subclass.

You might fine-tune and polish it some more.

But then you have to do a small amount of additional work to package this as an Audio Unit extension.

So you’ve got an extension.

You can embed it in an application.

You can sell that application on the App Store.

So I’d like to have my colleague, Torrey, now show you some of the power of Audio Unit extensions.

We’ve had some developers doing some really cool things with it in the last year.

How is everybody doing?

Happy to be at WWDC?

[ Applause ]

Let’s make some noise.

I’m going to start here by launching well, first of all, I have my instrument here.

This is my iPad Pro.

And I’m going to start by launching Arturia iSEM a very powerful synthesizer application.

And I have a synth trumpet sound here that I like.

[ Music ]

So I like this sound and I want to put it in a track that I’m working on.

This is going to serve as our Audio Unit plug-in application.

And now I’m going to launch GarageBand, which is going to serve as our Audio Unit host application.

Now, in GarageBand, I have a sick beat I’ve been working on that I’m calling WWDC Demo.

Let’s listen to it.

[ Music ]

Well move into what I call “the verse portion” next.

[ Music ]

And next, we’re going to work on this chorus here.

This is supposed to be the climax of the song.

I want some motion.

I want some tension.

And let’s create that by bringing in an Audio Unit.

I’m going to add a new track here.

Adding an instrument, I’ll see Audio Units is an option here.

If I select this, then I can see all of the Audio Units that are hosted here on the system.

Right now, I see Arturia iSEM because I practice this at home.

Selecting iSEM, GarageBand is now going to give me an onscreen MIDI controller that I can use here.

It’s complete with the scale transforms and arpeggiator here that I’m going to make use of because I like a lot of motion.

Over here on the left, you can see a Pitch/Mod Wheel.

You can even modify the velocity.

And here is the view that the Audio Unit has provided to me that I can actually tweak.

For now, I’m going to record in a little piece here and see what it sounds like in context.

So [ Music ]

All right, pretty good.

Let’s see what it sounds like in context.

[ Music ]

There we go.

That’s the tension that I want.

Now, let me dig in here a little bit more and show you what I’ve done.

I’m going to edit here.

And I’ll look into this loop a little bit more.

There are two observations that I’d like you to make here.

The first one is that these are MIDI events.

The difference between using inter-app audio and using Audio Units as a plug-in is you’ll actually get MIDI notes here, which is much easier to edit after the fact.

The other observation I’d like you to make here is that you see these individual MIDI notes here but you saw me play one big, fat-fingered chord.

So, it’s because I’ve taken advantage of the arpeggiator that’s built into GarageBand that I’ve got these individual notes.

And I can play around with these if I want to and make them sound a bit more human.

But I’m happy with this recording as it is.

The last thing that I’d actually like to show you here is, first, I’m going to copy this into the adjacent cell.

And I told you earlier that the Audio Unit view that’s provided here is actually interactive.

It’s not just a pretty picture.

So if you were adventurous, you could even try to give a little performance for your friends.

[ Music ]

Turn it up a little bit.

[ Music ]

Let’s wrap it up.

[ Music ]

That concludes my demo.

[ Applause ]

I want to thank you for your time, your attention, and always for making dope apps.

[ Applause ]

Thank you, Torrey.

So, just to recap here, you can see the session we did last year about Audio Unit extensions.

It goes into a lot more detail about the mechanics of the API.

We just wanted to show you here what people have been doing with it because it’s so cool.

So, speaking of MIDI, we saw how GarageBand recorded Torrey’s performance as MIDI.

We have a number of APIs in the system that communicate using MIDI, and it’s not always clear which ones to use when.

So I’d like to try to help clear that up just a little bit.

Now, you might just have a standard MIDI file like well, an ugly cellphone ringtone.

But MIDI files are very useful in music education.

I can get a MIDI file of a piece I want to learn.

I can see what all the notes are.

So if you have a MIDI file, you can play it back with AVAudioSequencer.

And that will play it back into the context of an AVAudioEngine.

If you wish to control a software synthesizer as we saw GarageBand doing with iSEM, the best API to do that with is AUAudioUnit.

And if you’d like your AUAudioUnit to play back into your AVAudioEngine, you can use AVAudioMIDIInstrument.

Now there’s the core MIDI framework which people often think does some of these other higher-level things.

But it’s actually a very low-level API that’s basically for communicating with MIDI hardware for example, an external USB MIDI interface or a Bluetooth MIDI keyboard.

We also supply a MIDI network driver.

You can use that to send raw MIDI messages between an iPad and a MacBook for example.

You can also use the core MIDI framework to send MIDI between processes in real time.

Now this gets into a gray area sometimes.

People wonder, “Well, should I use core MIDI to communicate between my sequencer and my app that’s listening to MIDI and synthesizing?”

And I would say that’s probably not the right API for that case.

If you’re using MIDI and audio together, I would use AUAudioUnit.

It’s in the case where you’re doing pure MIDI in two applications or two entities within an application maybe one is a static library from another developer.

In those situations, you can use core MIDI for inter-process or inter-entity real-time MIDI.

So that takes us to the end of our grand tour of the audio APIs.

We started with applications and at the bottom, the CoreAudio framework and drivers.

We looked at AVAudioEngine, how you use AVAudioSession to get things setup on all of our platforms except macOS.

We saw how you can use AVAudioPlayer and the AVAudioRecorder for simple playback and recording from files.

Or if your files or network streams involve video, you can use AVPlayer.

AVAudioEngine is a very good, high-level interface for building complex processing graphs and will solve a lot of problems.

You usually won’t have to use any of the lower-level APIs.

But if you do, we saw how in AudioToolbox there’s AUAudioUnit that lets you communicate directly with the I/O cycle and third-party, or your own instruments, effects, and generators.

And finally, we took a quick look at the core MIDI framework.

So that’s the end of my talk here.

You can visit this link for some more information.

We have a number of related sessions here.

Thank you very much.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US