AVCapturePhotoOutput - Beyond the Basics

Session 511 WWDC 2016

Continue your learning from Session 501: Advances in iOS Photography, with some additional details on scene monitoring and resource management in AVFoundation's powerful new AVCapturePhotoOutput API.

[ Music ]

Hi and welcome to Session 511, AVCapturePhotoOutput, Beyond the Basics.

This is a chock talk addendum to Session 501, Advances in iOS Photography.

I'm Brad Ford.

I'm an engineer on the core media capture team at Apple.

In session 501, we focused on AV foundations camera capture APIs, specifically the AVCapturePhotoOutput, which is a new interface for taking photos in iOS 10.

This output supports Capturing Live Photos, RAW + DNG, Wide Color Content, and Preview or Thumbnail Images.

If you haven't watched Session 501 yet, I recommend pausing here and watching Session 501 first.

You'll get a lot more out of this addendum.

In this session, we'll move beyond the AVCapturePhotoOutput basics and discuss two important topics we didn't have time for in Session 501.

Namely, Scene Monitoring, and Resource Preparation and Reclamation.

Lastly, we'll spend a few minutes on an unrelated but still very important topic, Camera Privacy Policy Changes in iOS 10.

By way of minimal review, the new AVCapturePhotoOutput has an improved interface that addresses some of AVCaptureStillImangeOutput's design challenges.

AVCapturePhotoOutput uses a function programming model.

There are clear delineations between mutable and immutable data.

It uses a separate object to encapsulate per photo settings called AVCapturePhotoSettings.

You pass it when making a photo capture request.

It uses a delegate style interface for tracking the progress of photo capture requests.

This is called the AVCapturephotoCaptureDelegate protocol.

All callbacks in the delegate protocol return an instance of AVCaptureResolvedPhotoSettings.

This is an immutable object in which all photo settings have been resolved.

AVCapturePhotoOutput also supports Scene Monitoring using a subset of these capture objects I just talked about.

Scene monitoring allows you to present UI that informs the user what scene dependent features are currently active.

In this screenshot of Apple's Camera app, the user is clear in a low-light situation.

The flash iconography at the bottom of the screen indicates that the user is in auto flash mode, meaning the flash should only be used if the situation requires it.

Apple's Camera app is a client of AVCapturePhotoOutput, which performs Scene Monitoring to drive the flash active yellow flash badge that you see in the top middle.

The presence of the yellow flash badge shows the user that if they take a picture now, the flash is going to fire.

AVCapturePhotoOutput offers Scene Monitoring for two kinds of scenes.

The first is the flash.

All of Apple's current iPhone models, as well as the 9.7-inch iPad Pro have a true tone flash to illuminate dark scenes for the rear-facing eyesight camera and a retina flash that turns your retina display into a true tone flash, illuminating it at up to three times normal in order to brighten up selfies in low light.

The second type of supported Scene Monitoring is Still Image Stabilization.

Still Image Stabilization is a multi-image fusion capture that blends differently exposed images to reduce blur in low-light situations.

It might not be totally obvious why Still Image Stabilization is a low-light feature, it's not that your hands shake more in the dark.

It's just that the camera needs to expose longer to gather the same number of photons requiring the shooter to be very, very steady.

Still Image Stabilization counters this problem by capturing multiple images at different exposures and then fusing them together to reduce noise and motion artifacts.

So at first glance, flash worthiness or Still Image Stabilization worthiness would seem like orthogonal features, but they're actually closely related.

And this causes some API ambiguity.

Looking at this graph, we see the applicable light ranges for Flash Capture with and without Still Image Stabilization.

I've shortened Still Image Stabilization to SIS for brevity.

The blue bar represents the light levels at which the photo output will use the flash if you've opted in for SIS.

The green bar represents the applicable light levels for flash if you've opted out of SIS.

Note that with SIS on, the photo output can do without the flash in darker scenes.

This is because SIS lowers the noise in the image to a point so the flash is not needed.

If your current scene's light level is, say, here, the answer to the question, is this a flash scene is a resounding yes.

But if the light level is here, the answer depends on whether you're interested in using Still Image Stabilization, and the inverse is true also.

So what to do.

The AVCapturePhotoOutput doesn't know what kind of capture you want until you request it.

But if you're using Scene Monitoring, it needs to run continuously.

Is the current scene a SIS scene or a flash scene?

In AVCapturePhotoOutput we've addressed this ambiguity with a specific API for Scene Monitoring called photoSettingsForSceneMonitoring.

And we've provided two key value observable properties that can asynchronously inform you when scene suitability changes with respect to Still Image Stabilization or flash.

You create an AVCapturePhotoSettings instance specifically for Scene Monitoring and specify which features you'd like AVCapturePhotoOutput to consider.

Here I've set the flash mode to auto indicating that I'm interested in using the flash feature when it's available, and I've also set isAutoStillImageStabilization Enabled to true.

So SIS should be considered too.

SIS tends to give better image quality results than flash, so if a scene falls into the overlapping range between SIS and flash, the photoOutput reports that it's an SIS scene.

Next, I assign this object as the photo settings for SceneMonitoring property.

This property can be set at any time including before you start the AVCaptureSession running.

To be informed of changes to flash and Still Image Stabilization Scene worthiness, I add key value observers for the aforementioned isFlashScene and isStillImageStabliziationScene Properties.

And I'm called back as scene worthiness changes for those two properties.

Now let's talk about Scene Monitoring defaults.

photoSettingsforSceneMonitoring is a nullable property, and its default value is nil, meaning no scenes are being monitored.

If you query isStillImageStabilization or isFlashScene without first configuring photo settings for Scene Monitoring, they will answer false forever and ever.

Once you do configure photo settings for Scene Monitoring, you can query or key value observe the two isScene properties and get appropriate answers.

Be aware, though that if your photo settings for Scene Monitoring contain a flash mode of off, isFlashScene will still always report false.

Ditto for AutoStillImageStabilization Enabled.

My recommendations for Scene Monitoring are simple.

If your app doesn't display any UI indicating what kind of scene the user is seeing, you don't need to enable Scene Monitoring.

But if you do, monitor what you intend to capture.

For example, if you intend to capture using Auto Flash but not SIS, then monitor with flash node set to auto and auto SIS off.

Doing otherwise will likely confuse your user, as your UI might report that it's not a flash scene while the flash actually does fire when taking a picture.

That wraps up Scene Monitoring.

On to our next Beyond the Basics topic, Resource Preparation and Reclamation.

To understand the need for on-demand resource preparation, let's look at AVCaptureSession's normal flow of data.

When you call AVCaptureSession startRunning, data begins flowing from all your AVCapture inputs the various AVCapture outputs.

Most outputs receive and handle this data in a streaming manner, such as the VideoPreviewLayer, which continuously displays input data to the screen.

Or VideoDataOutput which pushes buffers to your app via delegate callback.

Streaming outputs such as these require a disruptive capture render pipeline rebuild if you change their configuration.

You have to configure them for one type of output before you call startRunning.

AVCapturePhotoOutput is different, since it only receives data from its input on an as-needed basis.

When you request a photo by calling CapturePhoto with settings and delegate, the photo output delivers just one result or set of results.

Unlike the streaming outputs, the photo output has a lot of downtime.

It's perfectly positioned to prepare or reclaim resources on demand without causing a disruptive reconfiguration of the render pipeline.

It has the luxury of preparing while no one's watching.

Resource preparation isn't free, of course.

And AVCapturePhotoOutput's feature set is extensive.

Taking an uncompressed 420 photo in the native format of the AVCapture device requires some minimal resources.

Processed output such as EGRA or JPEG requires additional resources, since there's a format conversion involved.

Flash captures require their own set of hardware resources for delivering the pre-flash sequence and strobe synchronized result.

Still Image Stabilization requires multiple buffers for fusion.

RAW capture requires very large buffers.

RAW + JPEG requires a combination of resources big and small.

And bracketed capture requires multiple buffers to return multiple images to the client.

And of course, many of these features can be mixed and matched, requiring a superset of resources.

With so many capture features available, it's difficult for the AVCapturePhotoOutput to guess how many resources to prepare upfront.

And both over-preparing and under-preparing are bad.

We liken an over-preparing to baking a birthday cake every day of the year, just in case it's your birthday.

It's a lot of effort for us.

A lot of material invested.

A lot of uneaten cake gets thrown away.

Video preview might come up slower each time.

Memory consumption might be needlessly high.

Under-preparing is just as bad, if not worse.

If we're not ready to capture a photo with your requested feature set, we might miss the shot, while allocating resources on-demand.

Fortunately, we've provided a solution.

AVCapturePhotoOutput allows you to tell it in advance what kinds of captures you're interested in.

You do this by calling setPreparedPhotoSettingsArray, passing an array of AVCapturePhotoSettings with each one representing a different type of capture you'd like it to prepare for.

You can optionally pass a completion handler to be called when preparation is complete.

The photo output also provides a read only preparedPhotoSettingsArray property so you can query the settings array that you last set.

The setPreparedPhotoSettingsArray function can do several things.

It prepares resources for all the types of capture in your array of settings.

Also, it reclaims unneeded resources if there are any.

And by passing an empty array, you can reclaim everything.

It calls you back when all resources are prepared.

And it returns an error if resources couldn't be prepared.

This is all delivered via the completion callback.

preparedPhotoSettingsArray's default value is the default constructor for AVCapturePhotoSettings, which has JPEG set as the output format and AutoStillImageStabilization enabled.

preparedPhotoSettingsArray is a sticky property.

It persists across AVCaptureSession start or stopRunning, begin or commitConfiguration, and you can set it and forget it if you always take the same kinds of captures in your app.

Another nice feature of setpreparedPhotoSettingsArray is that it participates in AVCaptureSession begin/commitConfiguration deferred work semantics.

That is, if you call beginConfiguration and then change your sessions to topology by adding or removing inputs or outputs and then set new preparedPhotoSettingsArray and then commit the configuration the preparation won't occur until the commitConfiguration is called.

You can atomically change your session configuration and prepare your photo output for the new configuration simultaneously.

You can prepare before running your AVCaptureSession to ensure that your app is ready to capture photos as soon as video preview starts running.

If you call setPreparedPhotoSettingsArray when the session is stopped, it doesn't call your completion handler back right away.

Instead, the handler is called when preparation completes, which is after you call session startRunning.

If your session is stopped and you prepare with one set of settings and then you change your mind and call it again with another set of settings, your first completion handler fires immediately with prepared set to false.

This is effectively a cancellation of the first preparation.

We have three simple recommendations on how you should use our prepare APIs.

Firstly, prepare.

You can always issue a capture request without preparing first, but if the photo output isn't prepared for precisely the type of capture you want, you might get that first image back slowly.

Second, prepare before calling startRunning on your session.

Knowing the kinds of captures you're interested in lets the session allocate just the right amount for you during startup.

Third, re-prepare only when your UI changes.

You don't need to re-prepare every time you capture a photo, just when you change the types of capture you'll be performing, like when your user toggles RAW Capture or Bracketed Capture on or off in your app.

Not all AVCapturePhotoOutput features qualify for on-demand resource preparation.

The first of these is isHighResolutionCaptureEnabled.

Some camera formats allow you to capture a high resolution still image that is bigger than the format's sustainable streaming resolution.

For instance, the front camera's photo format on iPhone 6s and 6s Plus supports 5 megapixel stills but can only stream at 1280 by 960.

When the camera is configured with this format, it can either deliver 1280 by 960 stills or 5 megapixel stills depending on whether your photo settings specify high resolution capture.

But if the camera must be configured for 5 megapixel stills upfront, so AVCapturePhotoOutput requires you to opt-in for the feature before you start running by setting isHighResolutionCaptureEnabled to true.

Once you've opted in, you can take stills with or without high res capture enabled without causing an expensive graph rebuild.

Similarly, LivePhotoCapture involves delivering a movie asset as well as a still image.

The movie contains samples from the past, 1.5 seconds before your capture request.

So the capture render pipeline must be configured upfront to do this special kind of capture.

Lastly, live photos can be intelligently and automatically trimmed at capture time if large purposeful motion is detected, such as dropping one's arm down to put the device in their pocket.

If you wish to capture full duration untrimmed live photos, you must opt-out of autoTrimming before calling startRunning on your AVCaptureSession.

Our last topic of the day is Camera Privacy Policy Changes in iOS 10.

Let's review Apple's Privacy Policy with respect to media.

Photos and videos on a user's iOS device are personal, private and sensitive data.

Use of the camera or microphone is a privileged allowance that must be granted explicitly by the user.

So beginning in iOS 7, users were notified the first time an app used the camera or microphone and given an opportunity to disallow it.

This is a very good thing.

Transparency and trust are well worth the one-time annoyance of tapping okay.

In iOS 10, we're requiring apps to go one step further in transparency by informing the user why they want to access sensitive data.

Sometimes your UI makes it obvious, but sometimes not.

Your reason string should remove all ambiguity.

For instance, here AVCam is telling the user it wants to use the camera to take photos and video.

That's a pretty explicit statement about what it will use the camera for.

Likewise, apps linked against iOS 10 must provide a reason string for using the microphone.

And lastly, the Photos Library.

You should be clear in your reason string with respect to the Photos Library.

Are you using it for reading or writing or both?

In the latest version of Xcode you'll find a list of possible privacy description keys, not just for camera, mic and photos, but for access to all sensitive data.

In order to use any of these services, you must provide a reason string.

If you don't, your app will not be granted access to the desired service.

The three specific keys you should be concerned about for Capture are NSCameraUsageDescription, NSMicrophone3UsageDescription, and NSPhotoLibraryUsageDescription.

Let's summarize what we've learned about.

AVCapturePhotoOutput allows fine control of scene monitoring behavior.

It also allows on-demand resource allocation and reclamation.

And Capture clients must provide a reason for camera, mic, and photos use as of iOS 10.

For more information, visit the URL for the Advances in iOS Photography Session which is 501.

And if you're still at the show, we invite you to visit all three of these related sessions that have to do with photography, RAW, and Wide Color.

Thanks for watching and happy photo capture.

Enjoy the rest of the show.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US