Using Core Data With CloudKit

Session 202 WWDC 2019

CloudKit offers powerful, cloud-syncing technology while Core Data provides extensive data modeling and persistence APIs. Learn about combining these complementary technologies to easily build cloud-backed applications. See how new Core Data APIs make it easy to manage the flow of data through your application, as well as in and out of CloudKit. Join us to learn more about combining these frameworks to provide a great experience across all your customers’ devices.

[ Music ]

[ Applause ]

Good morning.

[ Applause ]

My name's Nick Gillett.

I'm an engineer here at Apple on the Core Data team, and it is my pleasure to welcome you to Using Core Data with CloudKit.

Today, we're here to talk about a belief that I have that all of my data should be available to me on whatever device I have, wherever I am in the world.

And to make this a reality, we have to make it a lot easier for you to add this functionality to your applications.

I'm sure many of you came in today with an iPhone, and you may also have a Mac at home.

You may even be toting around a MacBook or a MacBook Pro in your backpack while you're here at the conference.

And the data that we create on all of these devices is naturally trapped.

Right?

There's no easy way for us to move it from one device to another without some type of user interaction.

Now, to solve this, we typically want to turn to cloud storage.

Because it offers us the promise of moving data that's on one device seamlessly and transparently to all of the other devices we own.

And cloud storage has benefits even if we only happen to have a single device.

Right?

As our applications create data on this device, it can be backed up in the cloud and stored.

So that when we get a new device, whether by choice or by accident ask me how I know on the way out of the store, that device can be restored to the one that we always knew and loved.

And it may surprise you to learn that there are some existing technologies on our platforms that can help us with this problem.

For example, Core Data provides a robust set of APIs for managing data locally in your applications and on disc.

And the CloudKit framework provides access to one of the world's largest distributed databases.

Both of these frameworks are available on all of our Apple platforms.

And because of that, enable us to build a wide variety of applications.

In fact, these frameworks are actually quite similar.

They even express themselves using a common set of patterns and paradigms.

Modeling their APIs in terms of objects, models, and stores.

Now, in Core Data, we call these objects instances of NSManagedObject, and they're what gives our application access to values that we store on disc.

CloudKit similarly exposes a CKRecord, which is a key value like store for accessing data that you've stored in the cloud.

These objects are described by what we like to call a model.

And in Core Data we call that an NSManagedObjectModel, which you can create in code or using the model editor in Xcode.

Quite similarly, CloudKit uses a Schema.

And the CloudKit schema can be defined either by CloudKit dynamically as you use CKRecords in the development environment or using the CloudKit dashboard.

Finally, objects are persisted to use the Core Data vernacular, in what we like to call a store.

In Core Data those are instances of NSPersistentStore.

But in CloudKit, CKRecord's are stored in a CKRecordZone or in a CKDatabase.

And so, as you can see and indeed as many of you have pointed out to me over the years, it would be great if we could make it easier to combine these two conceptually similar frameworks.

And so to show you just how much easier we've made that this year, I'd like to take you through what it's like to create a brand new application in Xcode.

Here you can see, I have Xcode open.

And I'm going to create a new iOS project as a Master Detail application.

I like Master Detail applications because they give us a great UI to build on top of when exploring features of Core Data.

So, I'll select it and click Next.

And then give my application a name.

In this case, just WWDC Demo.

And, because we're here to hear about Core Data, we'll check the Core Data checkbox.

New in Xcode 11 is this checkbox called Use CloudKit.

And this tells Xcode that we want to generate an application that's designed to use both Core Data and CloudKit.

So, let's check that, and then we'll click Next and find our application somewhere to live on the file system.

After we tap Create, Xcode generates the application for us.

And if you've ever built a CloudKit application before, you'll know that there are a couple additional things we need to add to this before it's ready to build and run.

These come in the form of Capabilities, which we add using the Signing & Capabilities tab.

We need to add two.

The first is the iCloud capability.

So, I'll add that by clicking this plus (+) next to Capability, type iCloud, and hit Enter.

When I do that, I can hit the CloudKit checkbox and you'll see that that adds push notifications for me as well.

Xcode has also automatically created an iCloud container identifier for my application to use.

Next, we want to add a background mode capability.

And to do that, I'll hit the plus (+) again, and type Background, and hit Enter.

And the reason we do this is to enable remote notifications, which allow our application to receive push notifications when it's not running.

So, let's run this application and see what Xcode has created for us.

Here you can see that we have a very simple application with two view controllers.

A table view on the left and a detail view controller on the right.

And I can add some data to this application using the plus (+) button in the upper right-hand corner.

By default, Xcode generates a very simple Core Data data model for us that's just a simple timestamp.

But, we're here to see what it's like to add sync functionality.

And to do that, we need another device.

So, let's run the application on our iPhone.

And you can see that we have the master view controller and all of the data that we added on our iPad.

Now let's add some data using our phone.

And of course, watch that sync back to the iPad.

Now, because I have a magic push notification in this controller, I can make this happen whenever I want.

But let's see something that's a little bit more realistic.

I'm going to delete all the data from this application that was added on my iPhone from the iPad.

These top four rows.

And then, I'll do the same thing from the iPhone, deleting all the data that was added from the iPad.

Now, previously I used a magic push notification.

But I'm not going to touch the controller anymore.

I'll just let this video play so that you can see what it was like to actually watch these two devices converge when I filmed this.

Right?

Now, as contrived as that might be, it's pretty awesome that in a few simple clicks we've built an application that syncs end to end using both Core Data and CloudKit.

And I'd be doing you a disservice if somewhere in the application delegate I'd hidden 15,000 lines of code.

So, let's see what that looks like.

Now, this is a pretty standard application delegate.

In fact, if you've ever built a Core Data application before using Xcode, it'll look very familiar to you, including this area where we set up the Core Data stack.

The only thing that's different about this application is some new API in Core Data this year called NSPersistentCloudKitContainer, which is designed to help you manage Core Data stores that are backed by a CloudKit database.

Now, if you've ever built a Core Data application before using Xcode you will have seen NSPersistentContainer here instead, which is NSPersistentCloudKitContainer's superclass.

Because of that, you can add CloudKit functionality to your existing Core Data applications by changing as little as one line of code.

What exactly is NSPersistentCloudKitContainer?

Well, it's an encapsulation of a set of really common patterns that we saw everyone have to build when they wanted to implement end-to-end sync using CloudKit.

And it's designed to save you thousands of lines of code.

It's also a foundation that we hope to be able to build on with you iteratively in the years to come.

And of course, to do that get ready, here it comes we need your help.

We're going to need feedback about how NSPersistentCloudKitContainer works for you and what features it's missing, or whether or not its existing features are enough to meet the needs of your applications.

So, let's look at a little bit a few of those features in detail.

NSPersistentCloudKitContainer provides your application with a local replica.

A complete mirror if you will of the Core Data or the CloudKit database that backs it.

And, it also implements a robust scheduling and error recovery event loop so that your application doesn't have to worry about any operations.

Finally, it handles the transformation between instances of NSManagedObject, that I mentioned earlier, and CKRecord.

Now, a local replica is important for a number of reasons.

But it means that as your application works with objects, it will be writing them to a local store file managed by Core Data.

And, it will be reading them from that file as well.

And this is because of the difference between the performance profile of the local database versus cloud storage.

Right?

When we think in terms of latency of accessing a local file, we can read from a file on disc in milliseconds at the worst case.

Whereas, over the network, it might take us seconds or minutes to fetch the data that our application needs.

Likewise, the local store file can provide your application with much higher bandwidth, measured even on our iPhone devices in gigabytes per second, whereas the cloud is sometimes limited to megabytes or kilobytes per second of available bandwidth.

Now, this local replica necessarily adds a significant amount of complexity, and that's why NSPersistentCloudKitContainer implements a robust scheduling and error recovery event loop for you.

So that as your application writes data to the local store, NSPersistentCloudKitContainer automatically moves those objects up to the cloud.

And, when something changes in CloudKit, NSPersistentCloudKitContainer will schedule work on the system to bring those objects down and import them into your local database, making them available to your application.

Of course, during this process, your objects are necessarily transformed from instances of NSManagedObject into instances of CKRecord by NSPersistentCloudKitContainer.

And likewise, when something changes in the cloud, those CKRecord's will be made available to you as instances of NSManagedObject in your local store file.

And that's what NSPersistentCloudKitContainer brings to your application.

It's a complete local replica of everything in the private database in CloudKit.

You should know, though, that we do manage a specific custom zone for Core Data syncing.

We implement automatic scheduling so that you don't have to worry about optimizing any operations or scheduling them in the system.

And, I think more importantly, you don't have to worry about implementing any error recovery logic in your application.

Finally, we implement automatic serialization from NSManagedObject to CKRecord.

And we use your NSManagedObject model to figure out how to do this.

But it'd be obtuse of me to stand up here and tell you that once you've adopted NSPersistentCloudKitContainer, you're done, and you have a fully functioning application.

So, I'd like to spend the rest of the talk thinking about what it means for you to build on top of NSPersistentCloudKitContainer.

And I think that starts with building great applications that are powered by Core Data.

Later, we'll also look at how you can extend the foundation that we've built into NSPersistentCloudKitContainer to customize it to your needs.

Now, for me, building great applications with Core Data starts by absorbing a lot of knowledge.

And, to that end, we've written a ton of documentation this year about how NSPersistentCloudKitContainer works and how you can integrate it into your applications.

There are some features of Core Data that I feel go really well with NSPersistentCloudKitContainer.

Things like the FetchResultsController, which helps you build scalable user interfaces backed by a significant amount of data.

And query generations which help you stabilize those user interfaces against changes that might be happening in the background.

Like say from NSPersistentCloudKitContainer.

And finally, there's history tracking, which we introduced a few years ago to help you understand what's changed in the database.

And with NSPersistentCloudKitContainer, you can use it to decide whether or not any of those background updates are relevant to what your user's currently doing.

We'll be covering these and a whole lot more in our session on Thursday at 3 pm.

And, to go along with that, we're also introducing a new sample application this year that's designed to give you something to hold in your hands, to feel how NSPersistentCloudKitContainer works along with all these other features of Core Data.

It's designed to manage a set of posts.

And posts are a theme that we've been building on over the last few years in Core Data.

Right?

They're a great object for helping us understand how different pieces of your object graph will be affected by CloudKit.

Here you can see that our data model is pretty simple.

Right?

We have a title and some content.

And then, we have a set of tags that we can associate to each post.

The application will even let you see what it's like to manage photos with CloudKit.

Allowing you to attach files from the device's Photo Library to a post.

Now, let's talk about what it's like to build on top of NSPersistentCloudKitContainer.

And, as you might expect, this is a fairly dense section of the talk.

But you should know that our documentation covers in more detail a lot of things that we're going to talk about today.

So, don't worry if some of this goes right out the window.

There are a number of ways that we see clients internally extend NSPersistentCloudKitContainer, beginning with working with multiple stores.

And, we also see that clients like to customize the schema that they've been using with CloudKit, and indeed, the one that we use with NSPersistentCloudKitContainer.

This of course, as you may know, is because CloudKit is available on a number of different platforms, not only at Apple but also using Web Services or JavaScript.

And so, you should be able to work with NSPersistentCloudKitContainer's objects even if you're not doing so on one of our platforms.

Finally, we'll take a look at Data Modeling for collaboration.

Now, there are a number of great reasons why we might want to use multiple stores in our application.

Especially when working with network backed stores.

Multiple stores can help us segregate our data between distinct use cases in our application.

And, they can help us provide different types of constraints.

Right?

So, if we wanted to have one store file that manages a very specific set of validation constraints, maybe to validate user input, we can use a separate store for that.

Multiple stores are also a great way to throttle or coalesce data that's written very frequently on device.

And this can happen when you're reading something from the device that's generated maybe by the device itself or by an algorithm that you have which creates data.

If this algorithm generates data at a very high rate, it can be very expensive to constantly sync all of that data up to CloudKit.

And so, we see clients inject another store file into the mix and use that to coalesce this data until its ready to be analyzed and then uploaded to CloudKit later.

To show you how this works and how Core Data can make this easy for you, we're going to leverage a feature of NSManagedObjectModel called Configurations.

And here you can see we have our sample apps, NSManagedObjectModel in the model editor in Xcode.

Now, let's say that I want to add locations to the post.

Right?

I want to record the location where a post was created whenever that happens.

But locations can be generated at a high rate by the system and I don't need them unless a post is actually being created at that location.

So, let's separate them from the rest of the data model.

I'll start off by adding a new entity by clicking this plus (+) in the lower left-hand corner to store my location information.

Now, my location's going to be pretty simple.

It's just going to include a latitude and longitude that are both doubles that are exposed to me by the Core Location framework.

Of course, yours may include other things like altitude or accuracy.

Now we want to segregate these locations from the rest of our data.

And to do that I'll create a new configuration, again, by hitting this plus (+) button, but this time holding down my click to expose a menu that allows me to add a configuration.

I'll call this configuration Cloud and add all four of the entities that I actually want to sync to it.

Here you can see that now the Cloud configuration has just the entities that I want to be synced with CloudKit.

Let's create a new one called Local that will store our location objects.

And, in just a few lines of code, we can put this to work for us, empowering Core Data to tell our stores automatically what types of objects they stored.

At the top you can see we create an instance of NSPersistentCloudKitContainer, and then, we leverage something called NSPersistentStoreDescription, which tells NSPersistentCloudKitContainer about the types of stores that it's managing.

We create an instance of NSPersistentStoreDescription that points at the file called local.sqlite to hold our location information.

And, we assign it the configuration of local that we just created.

Then we set up our cloud store.

Similarly, we create an instance of NSPersistentStoreDescription and we point it to a different file; cloud.sqlite.

Then we give it the configuration Cloud, which tells NSPersistentCloudKitContainer that only the entities like post, and tags, and attachments, and image data should be stored in the store.

Finally, we assign it an instance of NSPersistent CloudKitContainerOptions, which tells NSPersistentCloudKitContainer what iCloud container identifier this store should be synced with.

Last but not least, we assign these two store descriptions to the PersistentStoreDescription's property of NSPersistentCloudKitContainer.

And with NSPersistentCloudKitContainer we can take this even further.

We already have local store and a cloud store.

But what if we want to share some data in CloudKit across multiple applications we happen to work on?

Well, NSPersistentCloudKitContainer supports that as well.

In fact, because you're using Core Data, your application will easily be able to handle all the data from these stores at the same time.

And, Core Data will automatically help you write data that you insert into the correct store file automatically.

We can do this by just adding three lines of code.

Right?

We create a new StoreDescription that points to our shared store file and give it a new configuration that we might create called shared.

Finally, we assign it a new instance of NSPersistent CloudKitContainer Options that identifies the container we want our shared data to be stored in; in this case, iCloud.com.wwdc.shared.

And of course, last but not least, we assign it to PersistentStoreDescriptions.

Now let's talk about the schema.

And, I want to cover a few points about the schema that I think are sort of the highlights.

The most important things that you need to know when you're looking at the records that we create in CloudKit.

I'll start off by talking about how we manage record types and how we manage entities, which are what you create in your NSManagedObjectModel.

Then, we'll look at how we implement a feature called Asset Externalization, which allows you to seamlessly store arbitrarily large values in CloudKit using NSPersistentCloudKitContainer.

Finally, we'll talk about how we manage relationships and how that might be different from the CloudKit experience you're used to.

To do that, we're going to use the ManageObjectModel for our sample application.

And I'm going to start off by focusing on the post entity.

You can see that it has two attributes; a title string and a content string.

And it also has two relationships to the attachment and tag entities.

Core Data generates an actual class for you to use in code as a subclass of NSManagedObject that looks like this.

And you can see that all of the attributes and relationships are represented on that class.

Now, this is the record that we will generate in CloudKit that goes along with this post.

And these are some example values that I've used to populate this record and make it what we call fully materialized.

Now, there are some things that I want to highlight to you first.

The record ID.

Core Data owns the record ID for all of the objects that it creates in CloudKit.

And, for each one, we will generate a simple UUID to use as its record name.

When the Record Name is combined with a zone identifier you get a CKRecord ID.

At the bottom, you'll see how Core Data manages type information.

And, there are two interesting things here.

The first is, what the heck are all these CD under bars?

This is Core Data's way of segregating the things that it manages to be separate from ones that either CloudKit implements for you you wouldn't believe how many people add modify date to their CKRecord.

Or, ones that you might add yourself.

And so, we prefix everything; the record type and all of our field names with CD under bar.

But, in the CD entityName field, we keep the real entity name of the object that this record goes with.

And we do this so that we can implement a feature called entityInheritance, wherein you could have subclasses of a post, let's say.

Perhaps an image post, or a video post.

The actual entities will always be identified by the CD under bar entityName field.

And we do this so that you can implement CK queries that get at all of the entity hierarchies you're interested in by querying a single record type.

Now let's look at how we implement these two strings.

Because these are variable length fields and our translation of them to CloudKit has some interesting quirks.

You'll see that we've inflated them to four total fields.

And the reason for this, is that this is how we implement asset externalization.

Here you can see that we have both a CD under bar content field and a CD under bar content under bar CKAsset.

This allows us to store strings that are arbitrarily large.

Anywhere from simple kilobytes all the way up to hundreds of megabytes or even gigabytes in length.

And when I said that this record is what we call fully materialized, what I mean is that you won't ever actually see all four of these fields at the same time.

If the strings are all very short, then you'll just see CD under bar content and CD under bar title in the record.

But, if one of them grows to be very large, approximately larger than 750 kilobytes, or if the total size of the record exceeds CloudKit's maximum 1 megabyte limit, you'll begin to see asset fields, or in this case, CD under bar content under bar CKAsset in the record instead.

If you're consuming our records on your own, then you'll need to check both places to see whether or not a value has been set for a specific attribute.

Now let's take a look at the relationships on a post.

You can see here that they're both instances of NSSet in the object that Core Data generates for you to use in your code.

And this is because a post has to what we call too many relationships.

That means that a post can be assigned too many attachments or it can have many tags associated with it.

The attachments relationship is what we like to call a Many-To-One, because an attachment can only be assigned to one post.

When we generate code for this, you'll see that there's an NSSet on the post, but there's only a single post object declared on the attachment ManageObject.

And this is the record they will generate for an attachment.

You can see that it has a UUID, an entity name, and a record type just like a post does, but you'll also see an additional field called CD under bar post.

This is how we store the relationship for a Two-One relationship.

The UUID of the related record in CloudKit will always be stored on the object it's linked to.

And, you might be wondering why we don't do this with CKReference instead.

And that's because CKReference has some limitations that we don't think work well for Core Data clients.

Namely that it's limited to 750 total objects.

But by storing a relationship this way, you can hold as many will fit in your CloudKit container.

Now let's look at a Many-To-Many relationship.

In this case, the one between a post and its tags.

You can see that there are two NSSet's on this object and this is because both objects can be related to many of the other type.

But when we generate the records for these objects, there aren't any fields that we materialized to hold this relationship.

Instead, Core Data materializes a custom join record.

Now, if you're familiar with relational databases, you'll recognize this concept.

It's basically an extrapolation of a row in a join table.

And it's called a CDMR or Core Data Mirrored Relationship.

A CDMR contains a set of twoples that are designed to describe the two objects that are linked, beginning with the entity names of the two objects that are linked, and their record names.

Now, as I said before, this is not the record ID.

This is the record name that you have to combine with a zone identifier to get the identity of the records that have been linked by the CDMR.

Finally, we also encapsulate the exact relationships that were used to make this link.

So why am I spending so much time talking about relationships?

Well, they have a huge impact on how we model our data for collaboration.

And I need to be very clear here, collaboration is not conflict resolution.

Conflict resolution is implemented automatically by NSPersistentCloudKitContainer using a last writer wins merge policy.

And the reason we do this, is that the job of conflict resolution is to keep the object graph and the data in CloudKit consistent with how you've modeled your data.

But, over the years we've heard that this can be a little frustrating.

So, let's take a look at how you can implement better merge behavior and align that merge behavior with your specific customer use cases using relationships.

To do that, we're going to use our post application again.

And I'm going to create a post, but I'm not going to assign it any content.

I'll let it sync over to another device let's say, and then I'll make some edits on each device at the same time.

Now, this is what we would traditionally term to be a conflict.

But, actually, this is a great example of two devices attempting to collaborate on a value.

Now, NSPersistentCloudKitContainer seeing that something has changed in CloudKit while it was away, will resolve this to preserve one of the two values.

Right?

Using a last writer wins merge policy.

Which means that we'll either get collaboration is great or everyone should do it.

And it's true.

Of course, you might be thinking that well, this is kind of dumb.

Why don't you just concatenate the two strings together, Core Data?

And we could.

Except that if you took our advice over the years and implemented incremental saving you might end up with something like this, which is not what anyone expects.

And it's not even English.

So, how can we do better?

Right?

Clearly this is a problem that's very important to us and it's very dear to our customers.

Well, let's look at our post entity and see if there's some changes we can make that would make this better.

The first thing to do is to stop creating collisions on flat values.

Content is simply a flat string and there's nothing that we can infer about the merge behavior that you want for that string, simply from looking at it.

But, if we break it up as a relationship, multiple devices can contribute contributions to the content field all by themselves.

And because of the way we store Two-One relationships, many devices can contribute these objects at the same time without creating a collision on the post object itself.

So, here I've broken it up as a very simple post content entity with a simple string.

Now, because of how we store Two-One relationships, these will be combined by devices on their own.

Right?

We need to traverse them in order to assemble the final value.

And we call this eventual consistency.

As both devices contribute post content objects, they'll download the post content objects from the other device and concatenate them or merge them, or whatever topology you want to apply to this problem to assemble the final value.

But we want that to be the same across all devices and a difference in download order could create a different final value.

So, we can order them say with something as trivial as a date.

In this way, the devices can order the post content objects before they do a concatenation using a simple sort descriptor in Core Data, giving them a consistent ordering and merge behavior across all devices.

But, I'm a bit of a distributed systems nerd and time is evil.

In fact, a devices notion of time might not even line up if they're in your house together.

So, we can take this a step further and implement a full causal tree by leveraging a relationship to a parent contribution and giving the entity some information about the device that's making the actual contribution.

In this way, we've implemented a rough sketch of something called the conflict-free replicated data type.

And this is an awesome emerging field of computer science that allows us to deploy algorithms to create consistent merge behavior across different user-user scenarios.

And that has been Using Core Data with CloudKit.

It's been my pleasure to show you NSPersistentCloudKitContainer and how you can easily implement sync functionality in your applications with it.

We've also got some great new sample code and documentation for you this year that I hope gives you a great experience with all of these new APIs.

And finally, I can't wait to see what you build with NSPersistentCloudKitContainer and how you extend it.

We'll be in a ton of labs this year.

Every day this week.

And as you know, we have a talk on 3 o'clock that covers a lot more features in Core Data in Thursday.

Thanks a lot, and I hope you have an awesome WWDC.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US