Introducing ARKit: Augmented Reality for iOS 

Session 602 WWDC 2017

ARKit provides a cutting-edge platform for developing augmented reality (AR) apps for iPhone and iPad. Get introduced to the ARKit framework and learn about harnessing its powerful capabilities for positional tracking and scene understanding. Tap into its seamless integration with SceneKit and SpriteKit, and understand how to take direct control over rendering with Metal 2.

Good afternoon.

[ Applause ]

Welcome to our session introducing ARKit.

My name is Mike.

I’m an engineer from ARKit team.

And today I’m thrilled to talk to you about the concepts as well as the code that go into creating your very own augmented reality experience on iOS.

[ Cheering and Applause ]

Thank you.

I know many of you are eager to get started with augmented reality.

Let’s show you just how easy it is using ARKit.

But first, what is augmented reality?

Augmented reality is creating the illusion that virtual objects are placed in a physical world.

It’s using your iPhone or your iPad as a lens into a virtual world based on what your camera sees.

Let’s take a look at some examples.

We gave a group of developers early access to ARKit.

And here’s what they made.

This is a sneak peek at some things you might see in the near future.

Within, a company focused on immersive storytelling, tells the story of Goldilocks using AR.

Transforming a bedroom into a virtual storybook, they allow you to progress a story by reciting the text, but even more importantly, they allow you to explore the scene from any angle.

This level of interactivity really helps bring your virtual scene alive.

Next, Ikea used ARKit in order to redesign your living room.

[ Applause ]

By being able to place virtual content next to physical objects, you open up a world of possibilities to your users.

And last, games.

Pokemon Go, an app that you’ve probably already heard of, used ARKit to take catching Pokemon to the next level.

By being able to anchor your virtual content in the real world, you really allow for a more immersive experience than previously possible.

But it doesn’t stop there.

There are a multitude of ways that you can use augmented reality to enhance your user experience.

So let’s see what goes into that.

There’s a large amount of domain knowledge that goes into creating augmented reality.

Everything from computer vision, to sensor fusion, to talking to hardware in order to get camera calibrations and camera intrinsics.

We wanted to make this all easier for you.

So today we’re introducing ARKit.

[ Applause ]

ARKit is a mobile AR platform for developing augmented reality apps on iOS.

It is a high level API providing a simple interface to a powerful set of features.

But more importantly, it’s rolling out supporting hundreds of millions of existing iOS devices.

In order to get the full set of features for ARKit, you’re going to want an A9 and up.

This is most iOS 11 devices, including the iPhone 6S.

Now let’s talk about the features.

So what does ARKit provide?

ARKit can be broken up into three distinct layers, the first of which is tracking.

Tracking is the core functionality of ARKit.

It is the ability to track your device in real time.

With world tracking we provide you the ability to get your device’s relative position in the physical environment.

We use visual inertial odometry, which is using camera images, as well as motion data from your device in order to get a precise view of where your device is located as well as how it is oriented.

But also, more importantly, there’s no external setup required, no pre-existing knowledge about your environment, as well as no additional sensors that you don’t already have on your device.

Next, building upon tracking we provide scene understanding.

Scene understanding is the ability to determine attributes or properties about the environment around your device.

It’s providing things like plane detection.

Plane detection is the ability to determine surfaces or planes in the physical environment.

This is things like the ground floor or maybe a table.

In order to place your virtual objects, we provide hit testing functionality.

So this is getting an intersection with the real world topology so that you can place your virtual object in the physical world.

And last, scene understanding provides light estimation.

So light estimation is used to render or correctly light your virtual geometry to match that of the physical world.

Using all of these together we can seamlessly integrate virtual content into your physical environment.

And so the last layer of ARKit is rendering.

For rendering we provide easy integration into any renderer.

We provide a constant stream of camera images, tracking information as well as scene understanding that can be inputted into any renderer.

For those of you using SceneKit or SpriteKit, we provide custom AR views, which implement most of the rendering for you.

So it’s really easy to get started.

And for those of you doing custom rendering, we provide a metal template through Xcode, which gets you started integrating ARKit into your custom renderer.

And one more thing, Unity and UnReal will be supporting the full set of features from ARKit.

[ Applause ]

So, are you guys ready?

Let’s get started.

How do I use ARKit in my application?

ARKit is a framework that handles all of the processing that goes into creating an augmented reality experience.

With the renderer of my choice, I can simply use ARKit to do the processing.

And it will provide everything that I need to render my augmented reality scene.

In addition to processing, ARKit also handles the capturing that is done in order to do augmented reality.

So using AVFoundation and Core Motion under the hood, we capture images as well as get motion data from your device in order to do tracking and provide those camera images to your renderer.

So now how do I use ARKit?

ARKit is a session-based API.

The first thing you need to do to get started is simply create an ARSession.

ARSession is the object that controls all of the processing that goes into creating your augmented reality app.

But first I need to determine what kind of tracking I want to do for my augmented reality app.

So, to determine this we’re going to create an AR session configuration.

AR session configuration, and its subclasses determine what tracking you want to run on your session.

By enabling and disabling properties, you can get different kinds of scene understanding and have your ARSession do different processing.

In order to run my session, I simply call the Run method on ARSession providing the configuration I want to run.

And with that, processing immediately starts.

And we also set up the capturing underneath.

So under the hood you’ll see there’s an AV capture session and a CM motion manager that get created for you.

We use these to get image data as well as the motion data that’s going to be used for tracking.

Once processing is done, ARSession will output ARFrames.

So an ARFrame is a snapshot in time, including all of the state of your session, everything needed to render your augmented reality scene.

In order to access ARFrame, you can simply call or pull the current frame property from you ARSession.

Or, you can set yourself as the delegate to receive updates when new ARFrames are available.

So let’s take a closer look at ARSessionConfiguration.

ARSession configuration determines what kind of tracking you want to run on your session.

So it provides different configuration classes.

The base class, ARSessionConfiguration, provides three degrees of freedom tracking, which is just the orientation of your device.

Its subclass, ARWorldTracking Session Configuration provides six degrees of freedom tracking.

So this is using our core functionality world tracking in order to get not only your device’s orientation, but also a relative position of your device.

With this we also get information about the scene.

So we provide scene understanding like feature points as well as physical positions in your world.

In order to enable and disable features, you simply set properties on your session configuration classes.

And session configurations also provide availability.

So if you want to check if world tracking is supported on your device, you simply need to call the class property isSupported on ARWorldTracking Session Configuration.

With this you can then use your World Tracking Session Configuration or fall back to the base class, which will only provide you with three degrees of freedom.

It’s important to note here that because the base class doesn’t have any scene understanding functionality like hit tests won’t be available on this device.

So we’re also going to provide a UI required device capability that you set in your app so that your app only appears in the App Store on devices that support World Tracking.

Next, let’s look at ARSession.

ARSession, again, is the class that manages all of the processing for your augmented reality app.

In addition to calling Run with a configuration, you can also call Pause.

So Pause allows you to temporarily stop all processing happening on your session.

So if your view is no longer visible, you may want to stop processing to stop using CPU and no tracking will occur during this pause.

In order to resume tracking after a pause, you can simply call Run again with the stored configuration on your session.

And last, you can call Run multiple times in order to transition between different configurations.

So say I wanted to enable plane detection, I can change my configuration to enable plane detection, call Run again on my session.

My session will automatically transition seamlessly between one configuration and another without dropping any camera images.

So with the Run command we also provide resetting of tracking.

So there’s Run options that you can provide on the Run command in order to reset tracking.

It’ll reinitialize all of the tracking that’s going on.

And your camera position will start out again at 000.

So this is useful for your application if you want to reset it to some starting point.

So how do I make use of ARSessions processing?

There’s session updates available by setting yourself as the delegate.

So in order to get the last frame that was processed, I could implement session didUpdate Frame.

And this will give me the latest frame.

For error handling, you can also implement things like session DidFailWithError.

So this is in the case of the fatal error.

Maybe you’re running a device that doesn’t support World Tracking.

You’ll get an error like this.

And your session will be paused.

The other way to make use of ARSessions processing is to pull the current frame property.

So now, what does an ARFrame contain?

Each ARFrame contains everything you need to render your augmented reality scene.

The first thing it provides is a camera image.

So this is what you’re going to use to render the background of your scene.

Next, it provides tracking information, or my device’s orientation as well as location and even tracking state.

And last, it provides scene understanding.

So, information about the scene like feature points, physical locations in space as well as light estimation, or a light estimate.

So, physical locations in space, the way that ARKit represents these is by using ARFrames or ARAnchors, sorry.

An ARAnchor is a relative or a real-world position and orientation in space.

ARAnchors can be added and removed from your scene.

And they’re used to basically represent a virtual content anchored to your physical environment.

So, if you want to add a custom anchor, you can do that by adding it to your session.

It’ll persist through the lifetime of your session.

But an added thing is if you’re running things like plane detection, ARAnchors will be added automatically to your session.

So, in order to respond to this, you can get them as a full list in your current ARFrame.

So that’ll have all of the anchors that your session is currently tracking.

Or you can respond to delegate methods like add, update, and remove, which will notify you if anchors were added, updated, or removed from your session.

So that concludes the four main classes that you’re going to use to create augmented reality experience.

Now let’s talk about tracking in particular.

So, tracking is the ability to determine a physical location in space in real time.

This isn’t easy.

So, but it’s essential for augmented reality to find your device’s position.

So not any position, but the position of your device and the orientation in order to render things correctly.

So let’s take a look at an example.

Here I’ve placed a virtual chair and a virtual table in a physical environment.

You’ll notice that if I pan around it or reorient to my device, that they’ll stay fixed in space.

But more importantly, as I walk around the scene they also stay fixed in space.

So this is because we’re using, constantly updating the projection transform, or the projection matrix that we’re using to render this virtual content so that it appears correct from any perspective.

So now how do we do this?

ARKit provides world tracking.

This is our technology that uses visual inertial odometry.

It’s your camera images.

It’s the motion of your device.

And it provides to you a rotation as well as a position or relative position, of your device.

But more importantly, it provides real world scale.

So all your virtual content is actually going to be to scale rendered in your physical scene.

It also means that motion of your device correlates to physical distance traveled measured in meters.

And all the positions given by tracking are relative to the starting position of your session.

So one more function of how World Tracking works.

We provide 3-D feature points.

So, here’s a representation of how World Tracking works.

It works by detecting features, which are unique pieces of information, in a camera image.

So you’ll see the axes represents my device’s position and orientation.

It’s creating a path as I move about my world.

But you also see all these dots up here.

These represent 3-D feature points that I’ve detected in my scene.

I’ve been able to triangulate them by moving about the scene and then using these, matching these features, you’ll see that I draw a line when I match an existing feature that I’ve seen before.

And using all of this information and our motion data, we’re able to precisely provide a device orientation and location.

So that might look hard.

Let’s look at the code on how we run World Tracking.

First thing you need to do is simply create an ARSession.

Because again, it’s going to manage all of the processing that’s going to happen for World Tracking.

Next, you’ll set yourself as the delegate of the session so that you can receive updates on when new frames are available.

By creating a World Tracking session configuration you’re saying, “I want to use World Tracking.

I want my session to run this processing.”

Then by simply calling Run, immediately processing will happen.

Capturing will begin.

So, under the hood, our session creates an AVCaptureSession sorry, as well as a CMMotionManager in order to get image and motion data.

We use the images to detect features in the scene.

And we use the motion data at a higher rate in order to integrate it over time to get your device’s motion.

Using these together we’re able to use sensor fusion in order to provide a precise pose.

So these are returned in ARFrames.

Each ARFrame is going to include an ARCamera.

So an ARCamera is the object that represents a virtual camera.

Or you can use it for a virtual camera.

It represents your device’s orientation as well as location.

So it provides a transform.

Transform is a matrix or a [inaudible] float 4 by 4 which provides the orientation or the rotation as well as translation of your physical device from the starting point of the session.

In addition to this we provide a tracking state, which informs you on how you can use the transform.

And last, we provide camera intrinsics.

So camera intrinsics are really important that we get them each frame because it matches that of the physical camera on your device.

This information like focal length and principal point, which are used to find a projection matrix.

The projection matrix is also a convenience method on ARCamera.

So you can easily use that to render your virtual geometry.

So with that, that is tracking that ARKit provides.

Let’s go ahead and look at a demo using World Tracking and create your first ARKit application.

[ Applause ]

So, the first thing that you notice when you open new Xcode 9 is that there’s a new template available for creating augmented reality apps.

So let’s go ahead and select that.

I’m going to create an augmented reality app.

Hit Next. After giving my project a name like MyARApp, I can choose between the language, which here I have the option between Swift as well as ObjectiveC as well as the content technology.

So the content technology is what you’re going to use to render your augmented reality scene.

You have the option between SceneKit, SpriteKit as well as Metal.

I’m going to use SceneKit for this example.

So after hitting Next and creating my workspace, it looks something like this.

Here I have a view controller that I’ve created.

You’ll see that it has an ARSCNView.

So this ARSCNView is a custom AR subclass that implements all the rendering or most of the rendering for me.

So it’ll handle updating my virtual camera based on the ARFrames that get returned to it.

As a property of ARSCNView, or my sceneView, it has a session.

So you see that my sceneView, I set a scene, which is going to be a ship that’s translated a little bit in front of the world origin along the z-axis.

And then the most important part is I’m accessing the session I’m accessing the session and calling Run with a World Tracking session configuration.

So this will run World Tracking.

And automatically the view will handle updating my virtual camera for me.

So let’s go ahead and give that a try.

Maybe I’m going to change our standard ship to use arship.

So let’s run this on the device.

So after installing, the first thing that you’ll notice is that it’s going to ask for camera permission.

This is a required to use tracking as well as render the backdrop of your scene.

Next, as you’ll see, I get a camera feed.

And right in front of me there’s a spaceship.

You’ll see as I change the orientation of my device, it stays fixed in space.

But more importantly, as I move about the spaceship, you’ll see that it actually is anchored in the physical world.

So this is using both my device’s orientation as well as a relative position to update a virtual camera and look at the spaceship.

[ Applause ]

Thank you.

[ Applause ]

So, if that’s not interesting enough for you, maybe we want to add something to the scene every time we tap the screen.

Let’s try that out.

Let’s try adding something to this example.

So as I said, I want to add geometry to the scene every time I tap the screen.

First thing I need to do to do that is add a tap gesture recognizer.

So after adding that to my scene view, every time I call the handle tap method, or every time I tap the screen, the handle tap method will get called.

So let’s implement that.

So, if I want to create some geometry, let’s say I’m going to create a plane or an image plane.

So the first thing I do here is create an SCNPlane with a width and height.

But then, the tricky part, I’m actually going to set the contents or the material, to be a snapshot of my view.

So what do you think this is going to be?

Well, this actually going to take a snapshot or a rendering of my view including the backdrop camera image as well as the virtual geometry that I’ve placed in front of it.

I’m setting my lighting model to constant so that the light estimate provided by ARKit doesn’t get applied to this camera image because it’s already going to match the environment.

Next, I need to add this to the scene.

So in order to do that, I’m going to create a plane node.

So, after creating an SCNode that encapsulates this geometry, I add it to the scene.

So already here, every time I tap the screen, it’s going to add an image plane to my scene.

But the problem is it’s always going to be at 000.

So how do I make this more interesting?

Well, we have provided to us a current frame, which contains an AR Camera.

Which I could probably use the camera’s transform in order to update the plane node’s transform so that the plane node is where my camera currently is located in space.

To do that, I’m going to first get the current frame from my SceneView session.

Next, I’m going to update the plane node’s transform in order to use the transform of my camera.

So here you’ll notice the first thing I do I actually create the translation matrix.

Because I don’t want to put the image plane right where the camera’s located and obstruct my view, I want to place it in front of the camera.

So for this I’m going to use the negative z-axis as a translation.

You’ll also see that in order to get some scale, everything is in meters.

So I’m going to use .1 to represent 10 centimeters in front of my camera.

By multiplying this together with my camera’s transform and applying this to my plane node, this will be an image plane located 10 centimeters in front of the camera.

So let’s try this out and see what it looks like.

So, as you see here again, I have the camera scene running.

And I have my spaceship floating in space.

Now, if I tap the screen maybe here, here and here, you’ll see that it leaves a snapshot or an image floating in space where I took it.

[ Applause ]

This shows just one of the possibilities that you can use ARKit for.

And it really makes for a cool experience.

Thank you.

And that’s using ARKit.

[ Applause ]

So, now that you’ve seen a demo using ARKit’s tracking, let’s talk about getting the best quality from your tracking results.

First thing to note is that tracking relies on uninterrupted sensor data.

This just means if camera images are no longer being provided to your session, tracking will stop.

We’ll be unable to track.

Next, tracking works best in well-textured environments.

This means we need enough visual complexity in order to find features from your camera images.

So if I’m facing a white wall or if there’s not enough light in the room, I will be unable to find features.

And tracking will be limited.

Next, tracking also works best in static scenes.

So if too much of what my camera sees is moving, visual data won’t correspond to motion data, which may result in drift, which is also a limited tracking state.

So to help with these, ARCamera provides a tracking state property.

Tracking state has three possible values: Not Available, Normal, and Limited.

When you first start your session, it begins in Not Available.

This just means that your camera’s transform has not yet been populated and is the identity matrix.

Soon after, once we find our first tracking pose, the state will change from Not Available to Normal.

This signifies that you can now use your camera’s transform.

If at any later point after this tracing becomes limited, tracking state will change from Normal to Limited, and also provide a reason.

So, the reason in this case, because I’m facing a white wall or there’s not enough light, is Insufficient Features.

It’s helpful to notify your users when this happens.

So, to do that, we’re providing a session delegate method that you can implement: cameraDidChangeTrackingState.

So when this happens, you can get the tracking state, if it’s limited, as well as the reason.

And from this you’ll notify your users.

Because they’re the only ones that can actually fix the tracking situation by either turning the lights up or not facing a white wall.

The other part is if sensor data becomes unavailable.

So, for this, we handle this by session interruptions.

So, if your camera input is unavailable due to the main reasons being your app gets backgrounded or maybe you’re doing multitasking on an iPad, camera images also won’t be provided to your session.

In this case tracking will become unavailable or stopped and your session will be interrupted.

So, to deal with this, we also provide delegate methods to make it really easy.

Here it’s a good idea to present an overlay or maybe blur your screen to signify to the user that your experience is currently paused and no tracking is occurring.

During an interruption, it’s also important to note that because no tracking is happening, the relative position of your device won’t be available.

So if you had anchors or physical locations in the scene, they may no longer be aligned if there was movement during this interruption.

So for this, you may want to optionally restart your experience when you come back from an interruption.

And so that’s tracking.

Let’s go ahead and hand it over to Stefan to talk about scene understanding.

Thank you.

[ Applause ]

Thank you, Mike.

Good afternoon everyone.

My name is Stefan Misslinger.

I’m an engineer on the ARKit team.

And next we’re going to talk about scene understanding.

So the goal of scene understanding is to find out more about our environment in order to place virtual objects into this environment.

This includes information like the 3-D topology of our environment as well as the lighting situation in order to realistically place an object there.

Let’s look at an example of this table here.

If you want to place an object, a virtual object, onto this table, the first thing we need to know is that there is a surface on which we can place something.

And this is done by using plane detection.

Second, we need to figure out a 3-D coordinate on which we place our virtual object.

In order to find this we are using hit-testing.

This involves sending a ray from our device and intersecting it with the real world in order to find this coordinate.

And third, in order to place this object in a realistic way we need a light estimation to match the lighting of our environment.

Let’s have a look at each one of those three things starting with plane detection.

So, plane detection provides you with horizontal planes with respect to gravity.

This includes planes like the ground plane as well as any parallel planes like tables.

ARKit does this by aggregating information over multiple frames so it runs in the background.

And as the user moves their device around the scene, it learns more about this plane.

This also allows us to retrieve an aligned extent of this plane, which means that we’re fitting a rectangle around all detected parts of this plane and align it with the major extent.

So this gives you an idea of the major orientation of a physical plane.

Furthermore, if there are multiple virtual planes detected for the same physical plane, ARKit will handle merging those together.

Then the combined plane will grow to the extent of both planes, hence the newer plane will be removed from the session.

Let’s have a look at how it’s used as in code.

The first thing you want to do is create an ARWorldTracking session configuration.

And plane detection is a property you can set on an ARWorldTracking session configuration.

So, to enable plane detection, you simple set the plane detection property to Horizontal.

After that, you pass the configuration back to the ARSession by calling the Run method.

And it will start detecting planes in your environment.

If you want to turn off plane detection, we simply set the plane detection property to None.

And then call the Run method on ARSession again.

Any previously detected planes in the session will remain.

That means they will be still present in our ARFrames anchors.

So whenever a new plane has been detected, they will be surfaced to you as ARPlaneAnchors.

An ARPlaneAnchor is a subclass of an ARAnchor, which means it represents a real-world position and orientation.

Whenever a new anchor is being detected you will receive a delegate call session didAdd anchor.

And you can use that, for example, to visualize your plane.

The extent of the plane will be surfaced to you as the extent, which is in respect to a center property.

So as the user moves the device around the scene, we’ll learn more about this plane and can update its extent.

When this happens you will receive a delegate session didUpdate frame or didUpdate anchor.

And you can use that to update your visualization.

Notice how the center property actually moved because the plane grew more into one direction than another.

Whenever an anchor is being removed from the session, you will receive a delegate called session didRemove anchor.

This can happen if ARKits merges planes together and removes one of them as a result.

In that case, you will receive a delegate call session didRemove anchor, and you can update your visualization accordingly.

So now that we have an idea of where there are planes in our environment, let’s have a look at how to actually place something into this.

And for this we provide hit-testing.

So hit-testing involves sending or intersecting a ray originating from your device with the real world and finding the intersection point.

ARKit uses all the scene information available, which includes any detected planes as well as the 3-D feature points that ARWorldTracking is using to figure out its position.

ARKit will then intersect our ray with all information that is available and return all intersection points as an array which is sorted by distance.

So the first entry in this array will be the closest intersection to the camera.

And there are different ways on how you can perform this intersection.

And you can define this by providing a hit-test type.

So there are four ways on how to do this.

Let’s have a look.

If you are running plane detection and ARKit has detected a plane in our environment, we can make use of that.

And here you have the choice of using the extent of the plane or ignoring it.

So if you want your user to be able to move an object just on a plane, you can take the extent into account, which will mean that if a ray intersects within its extent, it will provide you with an intersection.

If the ray hits outside of this, it will not give you an intersection.

In the case of, for example, moving furniture around, or when you only have detected a small part of the ground plane, we can choose to ignore this extent and treat an existing plane as infinite plane.

In that case you will always receive an intersection.

And you can just use a patch of the real world, but let your users move an object along this plane.

If you’re not running plane detection or we have not detected any planes yet, we can also estimate a plane based on the 3-D feature points that we have available.

In that case, ARKit will look for coplanar points in our environment and fit a plane into that.

And after that it will return you with the intersection of this plane.

In case you want to place something on a very small surface, which does not form a plane, or you have a very irregular environment, you can also choose to intersect with the feature points directly.

This means that we will find an intersection along our ray, which is closest to an existing feature point, and return this as the result.

Let’s have a look at how this is done in code.

So the first thing we need to do is define our ray.

And it intersects on our device.

You provide this as a CG point, which is represented in normalized image space coordinates.

This means the top left of our image is 0, 0, whereas the bottom right is 1, 1.

So if we want to send a ray or find an intersection in the center of our screen, we would define as CG points with 0.5 for x and y.

If you’re using SceneKit or SpriteKit, we’re providing a custom overlay that you can simply pass a CG point in a few coordinates.

So you can use the result of a UI tap over touch gesture as inputs to define this ray.

So let’s pass this point onto the hit-test method and define the hit-test types that we want to use.

In this case we’re using exiting planes, which means it will intersect with any existing planes that ARKit has already detected, as well as estimated horizontal planes.

So this can be used as a fallback case in case there are no planes detected yet.

After that, ARKit will return an array of results.

And you can access the first result, which will be the closest intersection to your camera.

The intersection points is contained in the worldTransform property of our hit-test result.

And we can create a new ARAnchor based on this result and pass it back to the session because we want to keep track of it.

So if we take this code and would apply it to the scene here where we point our phone at a table, it would return us the intersection points on this table in the center of the screen.

And we can place a virtual cup at this location.

By default, your rendering engine will assume that your background image is perfectly lit.

So your augmentation looks like it really belongs there.

However, if you’re in a darker environment, then your camera image is darker, and it means that your augmentation will look out of place and it appears to glow.

In order to fix this, we need to adjust the relative brightness of our virtual object.

And for this, we are providing light estimation.

So light estimation operates on our camera image.

And it uses its exposure information to determine the relative brightness of it.

For a well-lit image, this defaults to 1000 lumen.

For a brighter environment, you will get a higher value.

For a darker environment, a lower value.

You can also assign this value directly to an SEN light as its ambient intensity property.

Hence, if you’re using physically-based lighting, it will automatically take advantage of this.

Light estimation is enabled by default.

And you can configure this by setting the isLightEstimationEnabled property on an ARSession configuration.

The results of light estimation are provided to you in the Light Estimate property on the ARFrame as its ambient intensity value.

So with that, let’s dive into a demo and look how we’re using scene understanding with ARKit.

[ Applause ]

So the application that I’m going to show you is the ARKit Sample application.

Which means you can also download it from our developer website.

It’s used to place objects into our environment.

And it’s using scene understanding in order to do that.

So, let’s bring it right up here.

And if I move it around here, what you see in front of me is our focus square.

And we’re placing this by doing hit-testing in the center of our scene and finding on placing the object at its intersection point.

So if I move this along our table, you see that it basically slides along this table.

It’s also using plane detection in parallel.

And we can visualize this to see what’s going on.

So let’s bring up our Debug menu here and activate the second option here, which is Debug Visualizations.

Let’s close it.

And what you see here is the plane that it has detected.

To give you a better idea, let’s restart this and see how it finds new planes.

So if I’m moving it around here, you see it has detected a new plane.

Let’s quickly point it at another part of this table, and it has found another plane.

And if I’m moving this along this table, it eventually merges both of them together.

And it figured out that there’s just one plane there.

[ Applause ]

So next, let’s place some actual objects here.

My daughter asked to bring some flowers to the presentation.

And I don’t want to disappoint her.

So, let’s make this more romantic here and place a nice vase.

In that case, we again hit-test against the center of our screen and find the intersection the point to place the object.

One important aspect here is that this vase actually appears in real-world scale.

And this is possible due to two things.

One is that WorldTracking provides us with the pose to scale.

And the second thing is that our 3-D model is actually modeled in 3-D in real-world coordinates.

So this is really important if you’re creating content for augmented reality that you take this into account that this vase should not appear as high as building or too small.

So let’s go ahead and place a more interactive object, which is my chameleon friend here.

[ Applause ]

And one nice thing thank you and one nice thing is that you always know the position of the user when you’re running WorldTracking.

So you can have your virtual content interact with the user in the real world.

[ Applause ]

So, if I move over here, it might eventually turn to me, if he’s not scared.

Yeah, there we go.

[ Applause ]

And if I get even closer he might react in even different ways.

Let’s see.

It’s a bit oh!

There we go.

Another thing that chameleons can do is change their color.

And if I tap him, he adjusts the color.

So let’s give it a green.

And one nice feature that we put in here is I can move him along the table, and he will adapt to the background color of the table in order to blend in nicely.

[ Applause ]

So this is our sample application.

You can download it from the website and put in your own contents and play around with it, basically.

So next, we’re going to have a look at rendering with ARKit.

Rendering brings tracking and scene understanding together with your content.

And in order to render with ARKit, you need to process all the information that we provide you in an ARFrame.

For those of you using SceneKit and SpriteKit, we have already created customized views that take care of rending ARFrames for you.

If you’re using Metal, and want to create your own rendering engine or integrate ARKit into your existing rendering engine, we’re providing a template that gives you an idea of how to do this and provides a good starting point.

Let’s have a look at each one of those, starting with SceneKit.

For SceneKit we’re providing an ARSCNView, which is a subclass of an SCNView.

It contains an ARSession that it uses to update its rendering.

So this includes drawing the camera image in the background, taking into account the rotation of the device as well as any [inaudible] changes.

Next, it updates an SCNCamera based on the tracking transforms that we provide in an ARCamera.

So your scene stays intact and ARKit simply controls an SCNCamera by moving it around the scene the way you move around your device in the real world.

If you’re using Light Estimation, we automatically place an SCN light probe into your scene so if you use objects with physically-based lighting enabled you can already take advantage or automatically take advantage of Light Estimation.

And one thing that ARCNView does is map SCNNotes to ARAnchors so you don’t actually need to interface with ARAnchors directly, but can continue to use SCNNotes.

This means whenever a new ARAnchor is being added to the session, ARSCNView will create a node for you.

And every time we update the ARAnchor, like its transform, we update the nodes transform automatically.

And this is handled through the ARSCNView delegate.

So every time we add a new anchor to the session, ARSCNView will create a new SCNNode for you.

If you want to provide your own nodes, you can implement renderer nodeFor anchor and return to your custom node for this.

After this, the SCNNode will be added to the scene graph.

And you will receive another delegate call renderer didAdd node for anchor.

The same holds true for whenever a node is being updated.

So in that case, DSCNNodes transform will be automatically updated with the ARAnchors transform and you will receive two callbacks when this happens.

One before we update its transform, and another one after we update the transform.

Whenever an ARAnchor is being removed from the session, we automatically remove the corresponding SCNNode from the scene graph and provide you with the callback renderer didRemove node for anchor.

So this is SceneKit with ARKit.

Next, let’s have a look at SpriteKit.

For SpriteKit we’re providing an ARSKview, which is a subclass of SKView.

It contains an ARSession, which it uses to update its rendering.

This includes drawing the camera image in the background, and in this case, mapping SKNodes to ARAnchors.

So it provides a very similar set of delegate methods to SceneKit, which it can use.

One major difference is that SpriteKit is a 2-D rendering engine.

So that means we cannot simply update a camera that is being moved around.

So what ARKit does here is project our ARAnchor’s positions into the SpriteKit view.

And then render the Sprites as billboards at these locations, at the projected locations.

This means that the Sprites will always be facing the camera.

If you want to learn more about this, there a session from the SpriteKit team, “Going beyond 2-D in SpriteKit” which will focus on how to integrate ARKit with SpriteKit.

And next, let’s have a look at custom rendering with ARKit using Metal.

There are four things that you need to do in order to render with ARKit.

The first is draw the camera image in the background.

You usually create a texture for this and draw it in a background.

The next thing is to update our virtual camera based on our ARCamera.

This contains setting the view matrix as well as the projection matrix.

Third item is to update the lighting situation or the light in your scene based on our light estimate.

And finally, if you have placed geometry based on scene understanding, then you would use the ARAnchors in order to set the transforms correctly.

All this information is contained in an ARFrame.

And you have two ways of how to access this ARFrame.

One is by polling the current frame property on ARSession.

So, if you have your own render loop you would use well, you could use this method to access the current frame.

And then you should also take advantage of the timestamp property on ARFrame in order to avoid rendering the same frame multiple times.

An alternative is to use our Session Delegate, which provides you with session didUpdate frame every time a new frame has been calculated.

In that case, you can just simply take it and then update your rendering.

By default, this is called on the main [inaudible], but you can also provide your own dispatch queue, which we will use to call this method.

So let’s look into what Update Rendering contains.

So the first thing is to draw the camera image in the background.

And you can access the captured image property on an ARFrame, which is the CV Pixel Buffer.

You can generate Metal texture based on this Pixel Buffer and then draw in a quad in the background.

Note that this is a Pixel Buffer that is vended to us through AV Foundation, so you should not hold on to too many of those frames for too long, otherwise you will stop receiving updates.

The next item is to update our virtual camera based on our ARCamera.

For this we have to determine the view matrix as well as the protection matrix.

The view matrix is simply the inverse of our camera transform.

And in order to generate the projection matrix, we are offering you a convenience method on the ARCamera, which provides you with a projection matrix.

The third step would be to update the lighting.

So for this, simply access the Light Estimate property and use its ambient intensity in order to update your lighting model.

And finally would be to iterate over the anchors and its 3-D locations in order to update the transform of the geometries.

So any anchor that you have added manually to the session or any anchor that has been detected or that has been added to plane detection will be part of these frame anchors.

Then are a few things to note when rendering based on a camera image.

We want to have a look at those.

So one thing is that the captured image that is contained in an ARFrame is always provided in the same orientation.

However, if you rotate your physical device, it might not line up with your user interface orientation.

And a transform needs to be applied in order to render this correctly.

Another thing is that the aspect ratio of the camera image might not necessarily line up with your device.

And this means that we have to take this into account in order to properly render our camera image in the screen.

To fix this or to make this easier for you, we’re providing you with helper methods.

So there’s one method on ARFrame, which is the Display Transform.

The Display Transform transforms from frame space into view space.

And you simply provide it with your view port size as well as your interface orientation, and you will get an according transform.

In our Metal example, we are using the inverse of this transform to adjust the texture coordinates of our camera background.

And to go with this is the projection matrix variance that takes into account the user interface orientation as well as the view port size.

So you pass those along with clipping planes limits and you can use this projection matrix in order to correctly draw your virtual content on top of the camera image.

So this is ARKit.

To summarize, ARKit is a high level API designed for creating augmented reality applications on iOS.

We provide you with World Tracking, which gives you the relative position of your device to a starting point.

In order to place objects into the real world, we provide you with Scene Understanding.

Scene Understanding provides you with Plane Detection as well as the ability to hit-test the real world in order to find 3-D coordinates and place objects there.

And in order to improve the realism of our augmented content, we’re providing you with a light estimate based on the camera image.

We provide custom integration into SceneKit and SpriteKit as well as a template for Metal if you want to get started integrating ARKit into your own rendering engine.

You can find more information on the website of our talk here.

And there are a couple of related sessions from the SceneKit team who will also have a look at how to use dynamic shadows with ARKit and Sprite and SceneKit as well as a session from the SpriteKit team who will focus on using ARKit with SpriteKit.

So, we’re really excited of bringing this out into your hands.

And we are looking forward to see the first applications that you’re going to build with it.

So please go ahead and download the sample code, the sample application from our website.

Put your own content into it and show it around.

And be happy.

Thank you.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US