AVAudioEngine in Practice 

Session 502 WWDC 2014

Dive deeper into the new Objective-C based audio graph system in AV Foundation. Learn how apps can utilize rich audio services from simple audio playback tasks to complex audio rendering chains, including effects and 3D spatial rendering for games.

Good morning, everyone.

My name is Kapil Krishnamurthy and I work in Core Audio.

I’m here today to talk to you about a new API called AVAudioEngine that we are introducing for Mac OS X Yosemite and iOS 8.

As part of today’s talk we’ll first look at an overview of Core Audio and then we’ll dive into AVAudioEngine, look at some of the goals behind the project, features of the new API, the different building blocks you’ll be using and finally we’ll do a section on gaming and 3D audio.

So let’s get started.

For those of you who aren’t familiar with Core Audio, Core Audio provides a number of C APIs as part of a different frameworks on both iOS and Mac OS X.

And you can use these different APIs to implement audio features in your applications.

So using these APIs you will be able to play in the card sounds with low latency, convert between different file and data formats, read and write audio files, work with many data and also play sounds that get spatialized.

Several years ago we added some simple objective C classes to AVFoundation and they’re called AVAudioPlayer and AVAudioRecorder.

And using these classes you can play sounds from files or record directly to a file.

Now while these classes worked really well for simple use cases, a more advanced user might find themselves a bit limited.

So this year we’re adding a whole new set of API to AVFoundation called AVAudioEngine and my colleague Doug also spoke about a number of AV audio utility classes in session 501.

So using this new API you will be able to write powerful features with just a fraction of the amount of code that you may have had to previously write.

So let’s get started.

What were the goals behind this project?

One of the biggest goals was to provide a powerful and feature-rich API set.

And we’re able to do that because we’re building on top of our existing Core Audio APIs.

Using this API we want to developers to be able to achieve simple as well as complex tasks.

And a simple task could be something like playing a sound and running it through an effect.

A complex task could be something as big as writing an entire audio engine for a game.

We also wanted to simplify real-time audio.

For those of you who are not familiar with real-time audio it can be quite challenging.

You have a number of audio callbacks every second and for each callback you have to provide data in a timely fashion.

You can’t do things like take locks on the I/O thread or call functions that could block indefinitely.

So we make all of this easier for you to work with by giving you a real-time audio system but one that you interact with in a non real-time context.

Features of the new API: this is a full-featured Objective-C API set.

You get a real-time audio system to work with meaning that any changes that you make on any of the blocks take effect immediately.

Using this API you will be able to read and write audio files, play and record audio, connect different audio processing blocks together and then while the engine is running and audio is flowing through this system you can tap the output of each of these processing blocks.

You’ll also be able to implement 3D audio for games.

Now before we actually jump into the engine’s building blocks I thought I would give you two sample use cases to give you a little flavor of what you’ll be able to do using this API.

So the first sample use case is a karaoke application.

You have a backing track that’s playing and the user is singing along with it in real-time.

The output of the microphone is passed through a delay, which is just a musical effect and both of these audio chains are mixed and sent to the output hardware.

This could be a speaker or headphones.

Let’s also say that you tap the output of the microphone and analyze that raw data to see the user’s on pitch, he’s doing a great job.

And if he is, play some sound effects, so this stream also gets mixed in and played out to the output hardware.

Here’s another use case.

You have a streaming application and you receive data from the remote location.

You can now stuff this data into different buffers and schedule them on a player.

You can run the output of the player through an EQ whose UI you present to the user so that they can tweak the EQ based on that preference.

The output of the EQ then goes to the output hardware.

So these are just two sample use cases.

You’ll be able to do a whole lot more once we talk about AVAudioEngine.

So let’s get started.

The two main objects that we’re going to start with are the engine object and the node object.

And there are three specific types of nodes: the output node, mixer node and the player node.

We have other nodes as well that we will get to but these are the initial building block nodes.

So the engine is an object that maintains a graph of audio nodes.

You create nodes and you attach them to the engine and then you use the engine to make connections between these different audio nodes.

The engine will analyze these connections and determine which ones add up to an active chain.

When you then start the engine, audio flows through all of the active chains.

A powerful feature that the engine has is that it allows you to dynamically reconfigure these nodes.

This means that while the engine is rendering you can add new nodes and then wire them up.

And so essentially you’re adding or removing chains dynamically.

So the typical workflow of the engine is that you create an instance of the engine, create instances of all the nodes you want to work with, attach them to the engine so the engine is now aware of them and then connect them together, start the engine.

This will create an active render thread and audio will flow through all of the active chains.

So let’s now talk about a node.

A node is a basic audio block and we have three types of nodes: there are source nodes, which are nodes that generate an audio.

And examples of this are the player or the input node.

You have nodes that process audio.

So they take some audio and do something to it and push it up.

And examples are a mixer or an effect.

You also have destination nodes that receive audio and do something with it.

Every one of these nodes has a certain number of input and output buses.

And typically you see that most nodes have a single input and output bus.

But an exception to this is a mixer node that has multiple input busses and a single output bus.

Every bus now has an audio data format associated with it.

So let’s talk about connections.

If you have a connection between a source node and a destination node, that forms an active chain.

You can insert any number of processing nodes between the source node and the destination node.

But as long as you wire every bit of this chain up, it’s an active chain.

As soon as you break one of the connections, all of the nodes that are upstream of the point of disconnection go into an inactive state.

In this case, I’ve broken the connection between the processing node and the destination node, so my processing node and my source node are now in an inactive state.

The same holds true in this example.

So let’s now look at the specific node types.

The first node that we’re going to talk about is the output node.

The engine has an implicit destination node and it’s called the output node.

And the role of the output node is to take the data that it receives and hand it to the output hardware, so this could be the speaker.

You cannot create a standalone instance of the output node.

You have to get it from the instance of the engine that you’ve created.

Let’s move on to the mixer node.

Mixer nodes are processing nodes and they receive data on different input busses which they then mix to a single output, which goes out on the output bus.

When you use a mixer, you get control of the volume of each input bus.

And if you add an application that was playing a number of sounds and you put each of these sounds in on a separate input bus, using this volume control you can essentially blend in the amount of each sound that you want to hear.

So you create a mix.

You now have control over the output volume as well using a mixer.

So you are controlling the volume of the mix that you’ve created.

If your application has several categories of sound, you can make use of a concept called submixing to create submixers.

So let’s say that you have some UI sounds and you have some music.

And you put all of the UI sounds through one mixer, all of the music through another mixer.

Using the output volumes of each of these mixers, you can essentially control the volume of each of these submixers.

Let’s take that concept a step further and put all of the submixers through a master mixer.

The output volume of the master mixer will essentially control the volume of the entire mix in your application.

Now the engine has an implicit mixer node.

And when you ask the engine for its mixer node, it creates an instance of a mixer.

It creates an instance of the output node and connects it together for you by default.

The difference here between the mixer node and the output node is that you can create additional instances and then attach them to the engine and use them how you please.

Mixers can also have different audio data formats for each input bus.

And the mixer will do the work of efficiently converting the input data formats to the output data format.

So now that we have looked at these initial nodes, let’s talk about how this works in the context of the engine.

So let’s say that I have an app that creates an instance of the engine.

We can now ask the engine for its main mixer node so it’s going to create the instance of a mixer, create a mixer of the output node and connect the two together.

I can now create a clear note and attach it to the engine and connect it to the mixer.

So at this point I have a connection chain going all the way from a source to a destination, so I have an active chain.

When I then start up the engine, an active render thread is created and data is pulled by the destination.

So I have an active flow of data here.

The app can now interact with each one of these blocks and any change that it makes on any of the nodes will take effect immediately.

So now that we’ve talked about an active render thread how do you push your audio data on the render thread.

You use a player to do that.

Let’s look at player nodes.

Player nodes are nodes that can play data from files and from buffers.

And the way that it happens or the way that it’s done is by scheduling events, which simply means play data at a specified time.

That data, that time could be now or sometime in the future.

When you’re scheduling buffers you can schedule either multiple buffers and as each buffer is consumed by the player you get an individual callback, which you can then use a cue to go ahead and schedule more data.

You can also schedule a single buffer that plays in a loop fashion.

And this is useful in the case when you may have a musical loop or a sound effect that you want to play over and over again.

So, you load the data and then you play the buffer and it’ll continue to play until you stop the player or you interrupt it with another buffer.

We’ll get into that.

You can also schedule a file or a portion of a file called a segment.

So going back to our previous diagram, we had an engine that was in a running state.

So now I can create an instance of a buffer and load my data into it, shown by the red arrow.

Once I do that, I can schedule this buffer on the player.

And when the player is playing, the player will consume the data in the buffer and push it on the render thread.

In a similar manner, I can work with multiple buffers.

So over here I have multiple buffers.

I load data into each one of them and I schedule each one of them to play in sequence on the player.

As each buffer is consumed by the player, I get individual callbacks letting me know that that buffer is done.

I can use that as a cue and schedule more data.

In a similar manner you can work with a file.

And the difference here is you don’t have to actually deal with the audio data yourself.

All you need is a URL to a physical audio file with which you can create an AVAudioFile object and then schedule that directly on the player.

The player will do the work of reading the data from the file and pushing it on the render thread.

So, let’s now look at a good example of how we can achieve this.

I first create an instance of the engine, create an instance of a player and attach the player to the engine, so the engine is now aware of the player.

I’m now going to split my example up and show how you can first work with a file.

So, given a URL to an audio file, I can create the AudioFile object.

The next thing that I do is ask the engine for its mainMixer.

So the engine will create an instance of a mixer node, create an output node and connect the two together.

I can now go ahead and connect the player to the mixer with the files processing format.

So I have so I have a connection chain going all the way from a player that’s a source to the output node that’s a destination.

Now I can schedule my file to play atTime:nil, which is as soon as possible.

And in this case I pass a nil for the completion handler.

If I had some work that I needed to be done after the file is consumed by the player, I can pass in a block over here.

So in a similar manner I can work with a buffer as well.

Let’s say that I create an AVAudioPCMBuffer object and load some data into it.

The specifics of that part are covered in session 501.

So if you missed that, please refer to that session.

Once I have my buffer object I can go ahead and ask the engine for its mixer and make the connection between the player to the mixer with the buffer’s format.

Now I can go ahead and schedule this buffer atTime:nil, which is as soon as possible.

But note that we have an additional argument when we are working with buffers, the options argument.

We’re going to talk about that right after this.

But for now I’m going to pass nil, and nil for the completion handler as well.

So now that I’ve scheduled my data on the player I can go ahead and start the engine.

This creates an active render thread and then call play on the player.

And the player will do the work of creating the data from the file in the buffer and pushing it on the render thread.

So let’s now talk about the different buffer scheduling options.

In all of the examples that I’m going to talk about now, I’m going to specify nil for the atTime argument and that just means that in all of these examples, I’m going to schedule something to play as soon as possible.

So let’s talk about the first option and that’s when you want to schedule a buffer to play as soon as possible.

In that case all you need to do is schedule a buffer with the option set to nil.

You call play on the player and that buffer gets played.

If you have a buffer that’s playing now and you want to append a new buffer, it’s the exact same call.

You schedule the new buffer with the option set to nil and so the new buffer gets appended to the cue of currently playing buffers.

On the other hand if I want to interrupt my currently playing buffer with a new buffer I can schedule the new buffer with the AVAudioPlayerNode BufferInterrupts option.

So that will interrupt the currently playing buffer and start playing my new buffer right away.

Let’s now look at the different variants with a looping buffer.

So like I said earlier, if I have a buffer that’s to be played in a looped fashion, like a sound effect, for instance, I can load the data in that buffer and schedule that buffer with the AVAudioPlayerNodeBufferLoops option.

When I call play on the player, that buffer starts to play in a looped fashion.

If I want to interrupt a looping buffer it’s the same option as what we’ve seen before.

I have to schedule a new buffer with the AVAudioPlayerNode BufferInterrupts option.

So essentially it’s the same option for when you want to interrupt a regular buffer or a looping buffer.

The last case is an interesting one.

So if you have a looping buffer but you want to let the current loop finish before you start playing your new data you can schedule your new buffer with the AVAudioPlayerNodeBuffer InterruptsAtLoop option.

So this will let the current loop finish and as soon as that loop is done the new buffer starts playing.

Now that was a whole bunch of options, so let’s look at one practical example of how we can use these options.

So let’s say that I have a sound that’s broken up into three parts.

And the example that I’m going to use here is a siren.

So you have the initial buildup of the sound which is the “attack” portion of the siren.

You have the droning portion of the siren, which can be modeled using just a looping buffer.

And this is the “sustain” portion of the sound.

And then you have the dying down of the siren, which is the “release” portion of the sound.

So let’s say that I load up each of these sounds into different buffers.

The way that I can implement this in code is to first schedule the attack buffer with my options set to nil and then schedule the sustain buffer with the AVAudioPlayerNodeBuffer loops option.

So when I call play on the player, what this will do is play the attack portion of the sound and then immediately start playing the sustain portion of the sound and continue to loop that sustain buffer and that goes on until I’m ready to interrupt it.

So after some time has gone by when I’m ready to interrupt that I can schedule my release buffer with the AVAudioPlayerNodeBuffer InterruptsAtLoop option.

So this will let the last loop of the sustained buffer finish up and then play the release portion of my sound.

Now remember that I said in the beginning that all of my examples involve scheduling events that play as soon as possible.

I can also schedule events to play in the future.

So here’s an example of that.

In this case I’m just going to schedule a buffer to play 10 seconds in the future.

So I create an AVAudioTime object that has a relative sample time 10 seconds in the future, and I use the buffer sampleRate as my reference point.

I can now schedule the buffer with this AVAudioTime object and call play on the player and my buffer gets played 10 seconds in the future.

So we’ve talked about player nodes and how you can use a player to push your audio data on the render thread.

Well if you wanted to pull data from the render thread how do you do that?

You use a node tap.

And here’s some reasons for why you may want to do that.

Let’s say you want to capture the output of the microphone and save that data to disk, or if you have a music application and you want to record a live performance or if you have a game and you want to capture the output mix of the game.

You can do all of that using a node tap.

And what that is, is essentially a tap that you install on the output bus of a node.

So the data that’s captured by the tap is returned back to your application via the callback log.

So going back to a familiar diagram, I have two players that’s connected to the engines main mixer.

And I want to tap the output of the mixer so I can install a tap on the mixer and the tap will start pulling data from the render thread.

I can then go ahead, the tap will then go ahead and create a buffer object, stuff that data into the buffer and return that back to the application via a callback block.

In code it’s just one function call.

You install a tap on the mixer’s output bus 0 with a buffer size of 4096 frames and the mixer’s output format for that bus.

Within the block I have an AVAudioPCMBuffer that contains that much amount of data.

And I can do whatever I need to do with that data.

Alright, so to quickly summarize, you have an active render thread.

You use player nodes to push your audio data on the render thread and use node taps to pull audio data from the render thread.

Let’s now switch gears and talk about a new node called the input node.

The input node receives data from the input hardware and it’s parallel to the output node.

With the input node you cannot create a standalone instance.

You have to get the instance from the engine.

When you’ve connected the input node in an active chain and the engine is running, data is pulled from the input node.

So let’s go back to a familiar diagram.

I’ve connected the input node to the mixer nodes and that’s connected to the output node.

So when I start the engine, this is an active chain and data is pulled from the input node.

So, if I’m receiving data from the input node and the engine is running and I want to stop receiving data at a certain point, how do I do that?

It’s very simple.

All you have to do-oh I’m sorry.

I raced ahead.

Let’s look at a code example of how you can connect the input node.

So I get the input node from the engine.

Just make a connection to any other node with the input node’s hardware format and then start the engine.

This creates an active render thread and the input nodes pull for data.

So like I was saying earlier, if you have an input node that’s being pulled and you don’t want to receive data anymore from the input node, what do you do?

Just disconnect the input node.

So the input node will no longer be in an active chain and it won’t be pulled for data.

In order to do that, it’s just one line of code.

Using the engine, you disconnect the node output of the input node.

Now if you want to capture data from the input node, you can install a node tap and we’ve talked about that.

But what’s interesting about this particular example is, if I wanted to work with just the input node, say just capture data from the microphone and maybe examine it, analyze it in real time or maybe write it out to file, I can directly install a tap on the input node.

And the tap will do the work of pulling the input node for data, stuffing it in buffers and then returning that back to the application.

Once you have that data you can do whatever you need to do with it.

And let’s now talk about the last type of nodes in this section, effect nodes.

Effect nodes are nodes that process data.

So, depending on the type of effect, they take some amount of data in, they process it and push that data out.

We have two main categories of effects.

You have AVAudioUnitEffects and AVAudioUnitTimeEffects.

So what’s the difference between the two?

AVAudioUnitEffects require the same amount of data on input as the amount of data they’re being asked to provide.

So let’s take the example of a distortion effect.

If a distortion node has to provide 24ms of output, all it needs is 24ms of input that it then processes and pushes out.

As opposed to that, TimeEffects don’t have that constraint.

So let’s say that you have a TimeEffect that’s doing some amount of time stretching.

If it is being asked to provide 24ms of output, it may require 48ms of input.

So that brings me to my second point.

It is for that reason why you cannot connect a TimeEffect directly with the input node.

Because, when you have the input node running in real-time it cannot provide data that it doesn’t have.

As opposed to that, with AVAudioUnitEffects you can connect them anywhere in the chain.

So you can use them with players or you can use them with the input node.

These are the list of effects that we currently have available.

So on the effects side we have the Delay, Distortion, EQ and Reverb.

If you’re a musician you’re probably already familiar with these effects so you can use them in real time or use them in the player.

And on the TimeEffect side, we have the Varispeed and the TimePitch.

And these effects are useful in cases where you want to manipulate the amount of time stretching or maybe the pitch of the source content.

So let’s say that you have a speech file that you’re playing and you want to pitch the voice up to sound like a chipmunk.

Well you can do that using one of the TimeEffects.

So let’s now look at an example of how you can use one of these effects.

In this example I’m going to make use of the EQ.

But note that over here I’ve connected the EQ directly to the output node.

In all of my prior examples I was connecting nodes to the mixers, to the engine’s mixer node.

But I don’t always have to do that.

If I just have one chain of data in my application then I can just directly connect it to the output node, which is what I’ve done here.

So this is a multiband EQ and I specify the number of bands I’m going to use when I create an instance of the EQ.

So over here, I’m going to use two bands so I create an EQ with two bands.

I can then go ahead and get access to each of the bands and set up the different filter parameters.

Connecting the EQ is no different than what we’ve already seen.

I can connect the player to the EQ with the file’s processing format and connect the EQ to the engine’s output node with the same format and that’s it.

So with all of this information, let’s look at a demo that makes use of some of the nodes that we’ve talked about.

Okay, so I’m going to explain what I have here.

Over here I have two player nodes and each of these players is going to be fed by a separate, by separate looping buffers.

Each of the players are connected to separate effects and each of these effects are connected to separate inputs of the engine’s main mixer.

I have control over the output volume of the main mixer and down here I have a transport control, which essentially controls a node tap that I’ve installed on the main mixer.

So when I hit Record, a tap gets installed and I capture that data and save it to a file.

And then when I hit Play I’m just going to play that file back.

So let’s listen to what this sounds like [music].

So here I’m playing the drums.

I can change the volume and the pan of each player, so you can hear that effect.

I’m now going to go ahead and play the reverb a little bit.

It sounds a little too wet, so I’m going to keep it about here.

Let me start the other player.

[ Music ]

Okay, so what I thought I’d do now is use the node tap to maybe capture a little live performance, and any changes that I make to any of the nodes here should be captured in that performance.

So when we go back and listen to that we should hear that.

So let me do that.

[ Music ]

Okay, so I’m going to stop my recording, stop my delay player.

And let’s go back and listen to the recording.

[ Music ]

[ Silence ]

And that’s a preview of AVAudioEngine in action.

Let’s go back to slides.

Alright, so two of the settings that I was changing with the players were the volume and the pan for each player.

But these are actually settings of the input mixer bus that the player is connected to.

So the way we’ve exposed mixer input bus settings in the audio engine is through a protocol called the AVAudioMixing protocol.

Source nodes conform to this protocol so the player node and the input node do that.

And settings, like volume, you can change by just doing player.volume=.5 or player.band=minus 1.

When a source node is in an active connection with the mixer and you make changes to the protocols, different properties they take effect immediately.

However, if a source node is not connected to a mixer and you make changes to the protocol’s properties, those changes are cached in the source node and then applied when you make a physical connection to a mixer.

So these are the mixing properties that we have available.

Under the common mixing properties we just have volume right now.

Under the stereo mixing properties, we have pan and we have a number of 3D mixing properties that we’re going to look at in the next section.

So in the form of a diagram, let’s say that I have Player 1 connected to Mixer 1 and I go ahead and set player to start pan to minus 1, hard pan it to the left and player 1’s volume to .5.

So these mixing settings are now associated with Player 1.

And because Player 1 is connected to Mixer 1 they also get applied on the mixer.

If I were to disconnect Player 1 and connect it to Mixer 2, these mixing settings travel along with Player 1 and get applied to Mixer 2.

So in this sense, we’ve been able to carry settings that belong to the input bus of a mixer along with the source node itself.

Alright, so let’s now move onto the next section on gaming and 3D audio.

So in games, typically you have several types of sounds that you play.

You have short sounds, and we’ve seen AudioServices, which is one of our C-APIs get used for that.

For playing music we see AVAudioPlayer getting used a lot.

And for sounds that need to be spatialized, OpenAL is the API of choice.

Now while each of these APIs work really well for what they were designed for, if your application has to make use of all of them, then one of the biggest tradeoffs is that you have to familiarize yourself with the nomenclature associated with each API.

In addition, with AudioServices you don’t have a latency guarantee of when your sound will play.

With AVAudioPlayer you can’t play sounds that you have in buffers.

And with OpenAL, you can’t play sounds directly from a file or play compressed data.

With our knowledge of AVAudioEngine, we have to go back and cover cases one and two; we can easily do so.

For short sounds we can just load them into AVAudioBuffer objects and schedule them on a player.

For music you can just create an AVAudioFile log object and schedule that directly on a player.

So how do you play sounds that need to be spatialized?

We’ll look at that now.

I’d like to introduce a new node called the environment node and this is essentially a 3D mixer.

So when you create an instance of the environment node.

You have a 3D space and you get a listener that’s implicit to that 3D space.

All of the source nodes that connect to the environment node act as sources in this 3D space.

So the environment has some attributes that you can set directly on the environment node.

And then each of these sources have some attributes and you can set that using the AVAudioMixing protocol’s 3D properties.

Now in terms of data formats, I just wanted to point out that when you’re working with the environment node, all of the sources need to have a mono data format in order for that audio to be spatialized.

If the sources have a stereo data format, then that data is passed through and currently the environment node doesn’t support a data format greater than two channels on input.

So as a diagram, this is what it looks like.

I’ve created an instance of an environment node which means I now have a 3D space and I have an implicit listener.

I now create two player nodes.

Who are going to act as sources in my 3D space and using the AVAudioMixing protocol I can set all of the source attributes.

So what makes things sound 3D or virtual 3D?

Well, we have a number of attributes and some belong to the sources, others belong to the environment.

Let’s walk through each of the source attributes first.

So every source has a position in this 3D space.

And right now it’s specified using the right-handed cartesian coordinate system that right positive Y is up and positive Z is towards the listener.

Now with respect to the listener, the listener uses some spatial cues to localize the position of the source.

There’s an inter-aural time difference, just a slight time difference for the sound made by the source to get to each one of the listeners in those years.

There’s also an inter-aural level difference.

In addition, your head has the effect of doing some filtering and you also have some filtering here with the ears, depending on the ears.

So we have several rendering algorithms and each one of them model these spatial cues differently.

The thing is that we’ve exposed this as a source property.

So you can pick a rendering algorithm per source and some algorithms may sound better depending on the type of content your source is playing and also they differ in terms of CPU cost.

So you may want to pick a more expensive algorithm for an important source and a cheaper algorithm for a regular source.

The next two properties, obstruction and occlusion, deal with the filtering of sound if there are obstacles between the source and listener.

So in this case, I have the source, that’s the monster, and the listener, that’s the handsome prince, and there is a column between the source and the listener.

So the direct path of sound is muffled whereas the reflected paths across the walls are clear and this is modeled by obstruction.

On the other hand, if the source and the listener are on different spaces, so right now that’s a wall between the source and the listener.

Both the direct part of sound and the reflective parts of sound are muffled.

Let’s now move on to the listener, the environment attributes.

So every environment has an implicit listener and the listener has a position and an orientation.

The position is specified using the same coordinate system.

And for the orientation, you can specify using either two vectors, a front and an up vector, or three angles yaw, pitch and draw.

You also have distance attenuation in the environment, which is just the attenuation of sound as a source moves away from the listener.

So in this graph there are two points of interest.

There’s the reference distance, which is the distance above which we start applying some amount of attenuation.

There’s also the maximum distance, which is the point above which the amount of attenuation being applied is capped.

So all of the exciting stuff happens between the reference distance and the maximum distance.

And in that region we have three curves that you can pick from.

So, in the form of code, this is what it looks like.

All you need to do is get the distance attenuation parameters object from the environment and then you can go ahead and tweak all the settings.

Now every environment also has reverberation which is just a simulation of the sound reflections within that space.

The environment node has a built-in reverb and you can pick from a selection of factory presets.

Now once you pick the type of reverb you want to use, you can set a blend amount for each source and that just affects the amount of each source that you’ll here in the reverb mix.

So for some sources, you may want them to sound completely dry, so you set the blend amount to zero.

And other sources you may want to sound more ambient so you can turn up the blend amount.

We also have a single filter that applies to the output of the reverb.

So let’s say that you pick one of the factory presets and you want it to sound maybe a little brighter.

You can do that using the filter.

In code, this is what it looks like.

I get the ReverbParameters object from the environment.

In this case, I’m enabling it and then I load a factory preset, LargeHall preset.

And using the AVAudioMixing protocol, I set the source’s reverbBlend to 0.2.

So now we’ve talked about two types of mixers.

You have the 2D mixer and you have the 3D mixer.

And source nodes, that is the player or the input node, talk to these mixers using the AVAudioMixing protocol.

So I just wanted to point out that when a source node is connected to a 2D mixer, then all of the common and the 2D mixing properties take effect.

When a source node is connected to a 3D mixer, then all of the common and the 3D mixing properties take effect.

Let’s look at what that looks like here.

So let’s say that I have Player 1 who is connected to the 2D mixer.

I set the pan to be -1 and volume to be .5.

Note that pan is a 2D mixing property but volume is a common mixing property.

But in this case both of them will take effect, because the mixer node implements both of these properties.

If I disconnect Player 1 from the mixer and connect it to the environment node, the pan property will now be cached.

It doesn’t take effect because it’s a 2D mixing property.

It doesn’t apply to the environment node.

Volume, on the other hand, will continue to take effect, because it’s a common mixing property and it’s implemented by the environment node.

So with all of that information let’s look at a sample gaming setup.

This is just one of many ways that you can do this and this is just a suggestion.

It really all depends on your application.

But in this case, I have two 3D sources.

So, I’m going to use a player to play some sounds that will be spatialized and also live input.

So let’s say that the user is chatting and then you want to spatialize that in a 3D environment.

I can connect the player node and the input node to the environment node.

And that’s connected out to the engine’s main mixer.

I can now have a second player that I’m going to dedicate to playing music.

So this player is going to play music and I’m going to run it through an EQ and connect that to the main mixer.

Let’s say that I present some UI for the users so that he can tweak the EQ settings, maybe to make the music sound better.

I have a third player now that I’m going to dedicate only to UI sound effects.

So maybe the sounds that are made as I navigate through menus or if my game avatar has picked up a bonus item, etc. So the UI player is connected directly to the engine’s main mixer.

This is what the overall picture looks like.

So given all of this information let’s now look at a demo of the environment node.

[ Balls popping ]

So I want to explain what’s happening over here.

In this demo, I am using SceneKit for the graphics and SceneKit also comes with a physics engine.

So this works nicely with AVAudioEngine.

So I basically have two types of sounds that I’m playing; that’s the “fffuh” sound that plays and that’s before any ball is launched.

So to do that I use a player node and I have the long sound effect in a buffer and I schedule that buffer on the player node.

But I make use of the completion handler to know when the player has consumed the buffer.

So, when the player lets me know that it’s done with the buffer, I go ahead and now create a SceneKit node.

That’s a ball and I also create an AVAudioPlayer node, attach it to the engine and connect that to the environment node.

So I’m tying a player, a dedicated player to each ball.

Now the ball is launched into the world and as it goes about and collides with other surfaces, for every collision that happens SceneKit’s physics engine lets me know that a collision has happened with some other surface.

And I get the point of collision and also the impulse.

So using that, I can go and dig up the player node that’s tied to the SceneKit node.

I can set the position on the player based on where the collision happened, calculate a volume for the collision sound based on the impulse and then just play the sound.

But you can see now how, in this setup, for every ball that’s born into this world, a new player node is also created.

So the number of players is growing and I’m dynamically attaching it to the engine and connecting it to the environment node.

So this setup is very flexible.

[ Balls popping ]

Alright, so let’s get back to slides.

That brings us to the end of our talk.

So let’s quickly summarize all the things we’ve seen today.

We started off with talking about an engine and how you can create different nodes, attach them to the engine and then use the engine to make connections between each of these nodes.

We then looked at the different types of nodes: the destination node, which is the output node.

And we talked about two source nodes, the player node and the input node.

The player is the node you use to push audio data on the render thread.

We looked at two types of mixer nodes, the 2D Mixer and the 3D Mixer and how source nodes talk to these mixers using the AVAudioMixing protocol.

We then looked at effect nodes and two types of effect nodes: the AVAudioEffects and AVAudioUnitTime effects.

Finally we talked about node taps and that’s how you pull data from the render thread.

So I just wanted to point out that node taps are also a useful debugging tool.

Let’s say that you have a number of connections in your application and things don’t sound the way you expect them to sound.

What you can do is install node taps at different points in your chain on different nodes and just examine the output of each of these nodes.

And using that you can drill down and where the problem is.

So in that sense node taps are a useful debugging tool.

So that brings us to the end of our session.

I just want to say that this is the first version of AVAudioEngine and we are very excited about it.

So, we’d love to hear what you think.

Please try it out and give us your feedback.

If you have any further questions at a later point, you can contact Filip, who’s our Graphics and Game Technologies Evangelist.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US