What's New in Apple File Systems

Session 710 WWDC 2019

Learn about what's new in file system technology, including changes to file system layout and imaging technologies. If you are affected by the new Read Only System Volume, this is a session you will not want to miss.

[ Music ]

[ Applause ]

Good afternoon, ladies and gentlemen.

Welcome to the Filesystem session.

In this session, we're going to cover several topics related to the filesystems.

First, we're going to have a look at the role a filesystem plays in protecting system software on macOS.

We're going to describe how volume replication works in APFS.

And finally, we are going to talk about the new and very exciting feature coming to the devices running iOS and iPadOS, the access to the external files on the external media.

But before we start, let's rehab what happened to APFS recently.

APFS has been the default filesystem it on iOS and tvOS since 10.3 and on macOS since High Sierra.

One of the features which APFS introduced is a built-in volume manager.

The concept of a volume is not new to APFS.

They existed in HFS Plus.

The HFS Plus volume is a one-to-one map to on this partition.

And it occupies contiguous range of blocks on disk.

Because of this one-to-one mapping to partition and HFS Plus volumes cannot be added easily.

In order to add a new volume to partitioned disk, you need first to shrink the existing volume.

And then you can add a new volume.

APFS volumes are a lot more flexible because they share their own disk partition space with their siblings.

And that flexibility allows higher levels of a system to implement features which would not otherwise be possible to implement with the old-style volumes.

One of us one of the features which we'll implement is protecting system software from malicious or accidental updates.

You may remember back in the macOS El Capitan days, we introduced the concept of system integrity protection.

It was used to control access to a part of the directory hierarchy.

Some directories would be write protected, while changes would be allowed in the other parts of the filesystems.

This year, we're taking this one step further.

We're making entire root filesystem read-only.

[ Cheering and Applause ]

Obviously, a system where you cannot install new software or where user cannot save the data is not particularly useful.

In order to understand how we can reconcile those two conflicting goals, read-only and writeable at the same time, let's look at what's happening when we upgrade from macOS Mojave to macOS Catalina.

A typical container on macOS Mojave will have one main volume, and few service volumes.

For example, VM.

The main volume is used to store both user's data and system software.

When we start upgrade When we start the upgrade to macOS Catalina, we first change the role of the main volume and mark it as data volume.

We then can prune some parts of the directive hierarchy which contains only system software.

Once this is done, we can move to the next stage where we create a new empty volume, which will be used only to store system software.

We'll populate that volume with the system content.

And once this process is done, we can declare victory on the protection front.

We are read-only.

We're protected.

It's good, but it's not enough.

We still need to somehow tie the new system content with the user content.

And for that, we introduce the concept of a volume group.

A volume group is one data volume and one system volume.

And they treat it as it as a single entity.

The UI present it as a single disk.

They share encryption state.

If your volumes are encrypted, then the same password can be used to unlock both volumes.

Almost everything looks as a single unified entity.

There is one thing which is missing.

We still need to somehow provide an illusion of a single directory hierarchy.

Traditionally, it was done by mounting filesystems on top of directories in the root filesystem.

With a number of crossing points, which we need to introduce, and the number of volumes which are required in the filesystem, that approach becomes rather expensive.

So to deal with this, we introduced a new concept called Firmlink.

The Firmlink is a new filesystem object.

It's similar to Symlink.

But unlike Symlink, Firmlink support backwards and forwards path reversals, which is consistent in its representation.

And that consistency is rather important.

If you ever had to deal with an application, which absolutely insists it has to live in a particular directory, for example, such as applications.

You know that you have to be able to walk both from the top of a filesystem from the road down to the leaves and backwards and get the same path.

We can do that with Firmlinks.

The Firmlinks are a traversal point from a directory on the system volume to a directory on the data volume.

They only have one-to-one mapping.

One source.

One target.

You cannot use a Firmlink to cross a boundary of a volume group.

The Firmlinks are rather transparent to the user and the application.

They are created by the install at the installation time.

And they are not supposed to be noticed.

Once we get this new tool, we can use it to stitch up volumes together.

The installer will create entry on the system volume.

And point them to the corresponding volumes on the data volume.

And once it's done, we have a unified directory hierarchy.

We can reboot, mount root as read-only, and enjoy the protection which it gives us.

Life is good.

Everything's protected.

Everything's running.

But you still have to remember that the volumes are split during the install process.

There is no way to avoid it.

It will happen.

In the developer's preview, we elected to leave the root filesystem writeable to make it easier for you to test your application.

If you want to mimic your behavior, which will be implemented in the future, you can create a special file in the root directory and on reboot, your volumes will be mounted read-only.

That will change in the next build over next seed.

In the release build, you would still be able to mount root filesystem as read-write if you disable system integrity protection.

But this is not a persistent change.

On the reboot, it will revert back to read-only state.

You can remount it read-only, read-write again.

And again, it will revert to the same state on reboot.

As you can imagine, this is rather big change in the way macOS shapes and installs.

And it could catch some applications.

If, for example, your application uses a complex layout on the filesystem or comes with the installer package, make sure it works on the new read-only root partition.

If you're writing a backup utility, which cares about inode numbers, filesystem IDs, and the like, make sure that you test it, because the assumptions you may had previously may not be true.

So the bottom line is test, test, test.

And next, I will hand over to Jon Becker to talk about volume replication.

[ Applause ]

Thank you, Max.

Good afternoon.

My name is Jon, and I'm going to be talking about volume replication with APFS.

So let's dive right in.

What is volume replication?

Well, the basic idea is pretty simple.

We'd want to copy one volume to another volume.

Sounds simple.

But the important aspect of this is that we want the fidelity of this copy to be as high as possible.

In general, it's not sufficient to just be doing file-by-file copies.

We want to be copying all volume content.

All data, all metadata, volume attributes.

If the source volume contains a bootable OS, we want to be copying the metadata that makes that volume bootable so that the target of our copy will also be bootable.

Now I'm going to be talking about replication in the context of the Apple Software Restore command line utility, or ASR.

ASR has been around for a very long time.

Many of you may be familiar with it.

And its primary function is to do volume replication.

Now, with ASR, it's very common for the source of our replication to be a disk image.

And in that context, we often refer to the procedure of doing this replication to occur to a target volume as restoring, hence the name Apple Software Restore.

But you'll hear me use the terms restoring and replicating pretty much interchangeably.

So who wants this?

Who needs this functionality?

Well, if you are in education or Enterprise IT, if you do things like set up large labs with lots of machines, or if you write backup utilities, then replication is a function that you probably need or want to use with some regularity.

As we'll see, some of the new features of APFS present a challenge for how we can do replication.

On the other hand, as well also see, these new features also present an opportunity to make replication a more powerful and flexible operation.

So I want to back up for just a second and talk about how replication has looked in the past prior to APFS.

Now as Max demonstrated before, with previous filesystems, like HFS Plus, the filesystem, or I should say the volume, is in a one-to-one relationship with the partition that contains it.

And what that means is that the volume is backed by a contiguous block device.

So replicating can really involve just doing a block copy of the entire partition.

Of course, if we're copying the entire block device, we are copying all of the filesystem information in that volume.

Now there may be some complications like as in this diagram.

We have the source and target partition not being the same size.

But there are ways that we can fix that up.

And so by and large, block copying is a very effective and relatively simple way of doing replication.

But of course, with APFS, there are some new features that complicate this picture.

So APFS, as Max told you, has some features, like volume management and space sharing.

And as we can see, if we take a look at Volume 1 in the diagram here, it's spread out around its container partition.

And so it is not a contiguous block device.

And it may be its data may be interspersed with the data for another volume in the same container, like Volume 2 in this example.

Furthermore, we of course care about security and privacy.

And so we have to think about encryption.

Now with APFS, encryption is done at the filesystem level.

And what's more, on Macs that have the T2 security chip for internal storage devices, that encryption is on all the time.

And it is tied directly to the hardware, meaning that it is specific to that storage device in that Mac.

And so if we would try to copy the blocks for that volume and take them somewhere else, they won't be decryptable.

So the upshot here is block copies are really not a possible way to do replication with APFS volumes.

So how do we reconcile this with the needs that we have?

Well, new with macOS Catalina we're introducing APFS volume replication with ASR.

ASR and APFS.

Yeah, thank you.

ASR and APFS are tightly integrated.

And ASR can have APFS generate a stream from the source volume.

And that stream then gets written to the target volume.

Now because APFS is generating this stream, it knows where it needs to read the parts of the source volume.

And so it's not a problem that that source volume is not a contiguous block device.

As far as encryption goes, if the source of volume is encrypted, then it will be the data from it will be decrypted on the fly as the stream is generated.

And of course, if the source volume is protected by FileVault, then it does need to be unlocked by user action prior to this replication taking place.

If the target volume is itself encrypted then the data is in or the data is encrypted as it is written from the stream to the target volume.

And so in this case, that data is encrypted from the get go by the time it hits the target volume.

Okay? One other nice feature of this replication is that as the stream is generated, the volume data is defragmented on the fly.

The metadata is compacted, and so the resulting stream, and therefore, the resulting target volume, are very nicely optimized.

This can be a great way to do to master images, for example.

If you're mastering an image, and your final step is to do a replication operation, then your image volume will be very nicely optimized.

So when we do restores, replication, however you want to say it, there are a number of options that we can use.

I just want to call out a couple of them.

The first one is really very much like restores as we're used to.

We can specify a source volume.

You can specify a target volume.

And the idea is that the target volume will be completely erased and replaced.

Or its contents replaced by the contents of the source volume.

Now in this case, we will have Volume 2 be our target volume.

And so our restore would look something like this.

You would see a sample command line right here.

And the result is Volume 2's contents are erased, replaced by the contents of the source volume.

So the restored volume and source volume are the same.

Notice, by the way, that in this example, the target container also has another volume in it.

And that volume is left alone.

Its contents are preserved.

It does not form in any way a part of the replication operation.

But there's another option that we have, which is instead of having to specify a target volume and erase that, we can instead generate a brand-new volume to be the target on the fly.

And we do this by specifying the entire container as the target.

And this tells ASR that what we really want is to generate a brand-new volume and restore to that.

You can see another sample command line.

And the result is a new volume is created and restored to.

So in this case, Volume 1 and Volume 2 are both left alone.

Now I want to step away for just a second from replication.

And I want to talk about snapshots in APFS.

So a snapshot is just a point-in-time capture of all volume state.

So for example, we may have a volume with a number of files on it.

We can take a snapshot of that volume.

And the result is a capture, a freeze frame, of what that volume looks like at the time that the snapshot is taken.

If we make subsequent changes to the volume, like for example, deleting a file or adding some new files, the snapshot is still encompassing all of the state that existed at the time it was created.

So in this case, if we look at the live volume, it appears that that file that was deleted is not there.

But in some sense, it is still there because it's part of that snapshot.

So you might wonder, "Well, what does this have to do with replication?"

Well, once again, new with macOS Catalina, we can now do restores, replication of snapshots.

So to understand what that means [ Applause ]

Thank you.

To understand what that means, consider this volume here on the left.

It has two snapshots in it, Snap 1 and Snap 2.

They contain some files in common, the yellow ones, some files that are in one snapshot and not the other, some files that are in the other and not the one, and a file that's not an either snapshot.

I can restore this volume to a target volume over here on the right.

It's currently empty.

And the idea there, of course, is that at the end of that restore, the target volume looks like the source volume.

But instead of restoring the sort the live version of the source volume the way it currently looks, I can instead restore a snapshot.

So if, for example, I want to restore Snap 1, I can specify that's the snapshot I want to restore.

And the result is that my target now looks like Snap 1.

And notice in particular that those two files, which were deleted from the source volume, have come back to life in the target volume.

Having done that, maybe I want to add some new files to my target volume.

But then, maybe now I want to restore Snap 2 to that target volume.

And of course, again, the idea is that at the end of that operation, the live target volume should look like Snap 2 from the source volume.

But notice that both the source volume and the target volume already have Snap 1 on them.

Wouldn't it be great, if instead of having to copy all of Snap 2, I could just copy the things that are different between Snap 1 and Snap 2?

Well, indeed I can.

We call that difference between two snapshots a snapshot delta.

And the idea is when I restore a snapshot delta, I'm specifying the two snapshots whose difference I want to take.

I perform the restore.

And of course at the end, the live target volume looks like Snap 2 from the source.

But there's three things that I want you to notice about this target volume.

Number one, all of those files which were not part of the snapshot on the target have been discarded.

Number two, the files which were in Snap 1 but not in Snap 2 have been discarded from the live target volume.

They, of course, still exist in Snap 1 on the target volume.

And number three, the only things that we needed to copy were those files that were part of Snap 2 and not part of Snap 1.

Okay? Now this is a really powerful feature.

It's a great way to do incremental releases.

Imagine that you are updating a lab filled with 100 machines.

You can save a lot of time, a lot of network bandwidth, if you're only copying the difference between a couple of snapshots on your source image.

So that's what I had to say about replication.

Take home points here.

A more sophisticated filesystem requires more sophisticated mechanisms for doing replication.

APFS volume replication is really best done using ASR because it provides the highest fidelity of copies.

And it handles the encryption as necessary.

And finally, we can now restore snapshots in snapshot deltas using ASR.

And with that, I'm going to turn it over to Bill, who will talk about external file access for iOS.

Thank you.

[ Applause ]

Thank you, John.

Good afternoon.

You may recall two years ago, we introduced Files app and a new file provider, API.

Together, they offer an excellent experience for Cloud-based documents.

This year, we thought we could do more.

So this year, we're happy to announce support on iOS for accessing files on network shares and on USB storage.

[ Cheering and Applause ]

For USB storage, we support everything from compact flash and CF and cards and sticks, up through USB raid boxes.

We support a number of filesystems.

We support unencrypted APFS, unencrypted HFS Plus, and we support FAT and ExFAT.

[ Applause ]

This feature is available on all iOS and iPadOS devices.

It's available on iPad Pro with USB-C.

And for lightning devices, it's available with the appropriate adapters.

Moving on for network shares, we support connecting to SMB 3.0 servers.

We support connecting over Wi-Fi, over cellular, and over Ethernet.

There are a lot of exciting features here.

But one that we wanted to call out is for our iOS devices, iPadOS devices.

We're supporting search using the Windows Search Protocol.

So all these devices can search your SMB server if it supports the WSP protocol.

We're also really excited to announce that that that includes the SMB server and macOS Catalina.

Before going on, I wanted to talk a moment about security.

Security is a feature users have come to expect from iOS.

Two of the tools we have used to help deliver this security are process separation and privilege separation.

In developing this feature, we've kept this in mind and incorporated them from the ground up.

So for all of our volumes and shares, the filesystem manipulations happen not on the kernel but in a dedicated process space.

This separation helps us deliver the security iOS users have come to expect.

Now, let's see it in an action.

All right, I have an iPad.

And I don't know if you probably can't see it.

But I have a USB stick connected.

And I have mail.

And in files, on the left, you can see the locations.

iCloud Drive.

If we had a third-party Cloud provider, they would be listed there as well.

And we see a destination for our USB device.

When we select it, we see our photos, our documents, all the files and directories on there.

And we can manipulate them just like any other file in Files app.

So to copy one, you just select it, get it ready for drag and drop, pick your destination, drag it, and drop it.

And you can see it's already in the folder now.

Another thing we like to do with devices is let's copy photos onto them.

Photos, I have a picture a friend took in India of tomatoes.

Let's save it to the USB.

To do that, we just select the document, go to Share, go down the list to save to Files.

You can see, as a destination, we have the USB stick listed.

We just select it.

It already is.

We hit save.

And it's copied.

[ Cheering and Applause ]

I expect many of you are application developers.

And you're wondering how your application can take advantage of this.

This feature is available to any and all applications linked on or after iOS 13.

So rebuild your application.

And you can take advantage.

To see that in action, let's look at Numbers.

When I open up Numbers, it's start it's starting with my iCloud Drive.

The USB is listed as a destination.

We select it.

And there, we can see all of our documents.

We can see our photos dimmed because they're not Numbers documents.

And then we see that we've had two Numbers documents on this drive.

Let's open one of them.

It's a loan comparison comparing two different loan amounts and two different interest rates.

Just for comparison, let's see what happens if we raise the interest rate.

No. Oh.

[ Laughter ]

Set it to 20, not 200.

Oh, no. Well, that's crazy.

[ Laughter ]

And you can see on the see live the interest rate amounts changing.

So what does this mean for you as developers?

As I said, it's available if you're linked on or after iOS 13.

So rebuild your application and test.

We're adding five different filesystem types to iOS.

These filesystems act slightly differently than APFS on the internal flash storage.

One difference is iOS has always had case-sensitive filesystems.

FAT and ExFAT are case-insensitive.

And HFS and APFS can be configured either way.

The Clone System call may not always be available.

So as these differences are if these are differences are important to you or as they are, please pay attention to volume capabilities.

There are two different APIs or a couple of APIs that can get them for you.

One I wanted to call out are the resourceValues in NSURL.

These can give you parameters for the filesystem you're working with.

Another important point is file movement may take time.

So please put your temporary files near your working files.

If you don't, right, this is helpful.

Because Save-Save uses a rename at the very end so that a user always sees a good file.

They either see the document that they started with, or they see the new save.

For this to work, we need that rename.

And that only works if they're both on the same filesystem.

Also, if you're not careful about your temporary files, they may end up in your container on the internal storage.

And that's going to lead to a lot of unnecessary IO.

File Manager can help you with this.

If you ask for a URL for the itemReplacementDirectory appropriate for your documents, it will give you a temporary directory on the same filesystem.

Another thing is that external devices can go away.

A network can go out of range.

A file server can go down.

A CAT can disconnect a cable.

These things can happen.

And your application needs to be robust in face of it.

One thing I especially wanted to point out is mmap can be dangerous.

It can be really powerful.

But if the file goes away, the only way the kernel can tell you that is with a BUS error.

So one thing, if you're using NSData, there is a hint you can give NSData and say mmap this data from a file if it's safe.

The last point is that external devices, all of them, have higher latencies than APFS on the internal flash storage.

So if you're doing sizable IO, please keep multiple operations in-flight at once.

So to summarize today's talk, Max talked to us about how we're making the root volume read-only for enhanced security.

John talked to us about ASR and how you can use ASR to replicate volumes, including snapshot deltas.

And I talked to you about how we're adding support for external files to iOS and iPadOS and how that will let you access files on USB storage and network shares.

For more information, we have a lab after this session.

And tomorrow, there's a talk, Combine and Advances in Foundation, where they're going to talk more about what you can do with Foundation on external media.

And there's also a video session, What's New in File Management and Quick Look, that will talk more about the UI document aspects of this talk.

Thank you.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US