What's New in AVAudioEngine

Session 510 WWDC 2019

AVAudioEngine enables the realtime capture, processing, and playback of audio. Learn how to take advantage of enhancements to this powerful API, such as support for voice processing and spatial rendering mode selection, in your own audio app.

[ Music ]

Hello, and welcome to our session about audio API updates.

My name is Peter Vasil, and I am a software engineer in the Core Audio team.

Let's start with what's new in AVAudioEngine.

We have added a couple of enhancements and new APIs to AVAudioEngine.

We now have a voice processing support.

We have added two new nodes, AVAudioSourceNode and AVAudioSinkNode, and we have made some improvements to spatial audio rendering.

Now, let's dive into the details.

AVAudioEngine now has a voice processing mode.

The main use case for the mode is echo cancellation and voice-over IP applications.

What does this mean?

When enabled, extra signal processing is applied on the incoming audio, and any audio that is coming from the device is taken out.

This requires that both input and output nodes are in the voice processing mode.

Therefore, when enabling the mode on either of the I/O nodes, the engine takes care that both I/O nodes exist and that they are switched to the voice processing mode.

Voice processing is only available when rendering to an audio device, not a manual rendering mode.

To enable or disable voice processing, we can use set voice processing enabled on either the input or output node.

Voice processing cannot be enabled dynamically, which means the engine needs to be in a stop state when enabling the mode.

The AV Echo Touch Sample Code project demonstrates how to use voice processing with AVAudioEngine in detail.

Let's look at the new nodes in AVAudioEngine, AVAudioSourceNode and AVAudioSinkNode.

Both nodes wrap a user-defined blog that allows apps to send or receive audio from AVAudioEngine.

When rendering to an audio device, the block operates under real-time constraints, which means that within the block there shouldn't be any blocking calls like memory allocations, call to lib dispatch, or blocking on a mutex.

With AVAudioSourceNode, we pass or render block to the node, which sends audio data to its output.

This makes it very easy to create generated nodes without having to implement a full audio unit and wrap it with AVAudio unit.

The node can be used in both real time and manual rendering mode.

AVAudioSourceNode support linear PCM conversions such as sample rate or bit depth conversions and has one output but no input.

This short code snippet shows how to use AVAudioSourceNode.

As we can see, the block is passed as an initializer argument, and after creating the node, it can be connected just like any other node.

A more detailed example can be found in our Signal Generator Sample Code project.

Let's look at AVAudioSinkNode.

AVAudioSinkNode is a symmetrical counterpart of AVAudioSourceNode.

It wraps a user-defined block that receives the input audio from the node chain that is connected to its input.

AVAudioSinkNode is restricted to the input chain.

In other words, it must be connected downstream of the input node.

It does not support format conversions, and the format within the block has to be the same as the hardware input format.

The node can be useful for voice-over IP application when the input needs to be processed in real time, in which case installing a regular tap would not be sufficient because the tap doesn't operate in a real-time context.

Here is a code snippet demonstrating how to create an AVAudioSinkNode.

It is quite similar to AVAudioSourceNode.

The main steps to note here are to initialize the node with a blog, attach it to the engine, and connect it to a node downstream of the input node.

Now let's see the spatial rendering improvements.

We have introduced an automatic spatial rendering algorithm, and AVAudioPlayerNode now supports spatialization of multichannel audio content.

With the automatic spatial rendering algorithm, the most appropriate spatialization algorithm is selected for the current route.

This means developers don't need to figure out what algorithms are suitable for headphones or different speaker configurations.

This adds near-field and in-head rendering for headphones and virtual surround for built-in speakers is available starting with iOS devices and laptops from 2018 and newer.

Here we see the new API in the AVAudio3DMixing protocol.

The AVAudio3DMixing rendering algorithm enum has a new entry, auto.

Additionally, we can specify the output type with the output type property.

With the property set to auto, the output type can be automatically detected in real-time mode but not in manual rendering mode.

With the ability to spatialize multichannel streams, we support point-source and ambience bed rendering.

We also support channel-based formats and higher-order Ambisonics up to the third order.

We have added two new spatialization properties to the AVAudio3DMixing protocol, source mode and pointSource and head mode.

SpatializeIfMono is legacy behavior.

This is the same as bypass for any multichannel stream, which means a pass through or downmix to the output format.

With pointSource, the audio is sum to mono and render at the location of the player node.

And with ambienceBed, the audio is anchored to the 3D world and is rotatable with the player node's position relative to the listener orientation.

Here we see an example of ambienceBed source mode with automatic rendering algorithm.

After setting the properties, it is important to make sure the format that is used for the connection between player node and the environment node contains the multichannel layout.

Now let's talk about what's new in AVAudioSession.

AVAudioSessionPromptStyle is a hint to apps that play voice prompts in order to modify the style of the played prompt.

For example, if Siri is speaking or a phone call is ongoing, verbal navigation prompts is a confusing user experience.

We also don't want Siri to record a navigation prompt.

Navigation apps, for example, are encouraged to pay attention to prompt style changes and modify the prompts for better user experience.

We have three prompt styles, none, short, and normal.

We can now indicate to disable prompts completely, play shortened prompts, or play the regular prompts.

Let's look at other AVAudioSession enhancements.

The default policy is to mute haptics and system sounds when audio recording is active.

A new property, allow haptics and system sounds during recording, allows system sounds and haptics to play while the session is actively using an audio input.

It can be set using the setter set allow haptics and system sounds during recording.

For more information, please visit the developer website.

Thank you for your attention.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US