Mastering Core Data

Session 118 WWDC 2010

Core Data contains a vast set of advanced features to help you better manage your data and evolve your application over time. Master the techniques for working with data in your application, from being more efficient to doing more in the database and changing how you store your data over time. Take your Core Data knowledge to the next level.

Good afternoon.

Welcome to Session 118, Mastering Core Data.

Please turn off your WiFi devices they make me nervous.

Much better.

I'm Miguel Sanchez, I'll be doing the first half of the talk, and then pass it on to Adam Swift to wrap it up.

So Core Data has been around on the desktop since 10.4 Tiger, on iPhone OS since 3.0 last year, and obviously on the iPad earlier this spring.

The purpose of this session is to help you become more proficient with our technology.

We are, this is not an introductory session.

We assume a basic level of knowledge with the framework.

You could have attained that by reading the documentation, or worked through an example.

You certainly don't have to be a super expert.

But we will not be covering the most basic concepts of Core Data.

This is our to do list for the session.

And let's jump right in.

So let's start out by talking about some tips and tricks for modeling.

Quick recap.

The modeling, the model is how you describe your data to us, and we help you manage it.

So the better job you do at describing that data, the more we're able to help you.

The vocabulary you'll be using to describe your model are the entities, which more or less correspond to the tables in your store, attributes, and relationships.

And you want to make sure that you design your model, not around an ideal representation of what your data could be, but more around a practical implementation of how you're going to be accessing that data with your application.

Once you give us a model the data will be presented to you as instances of NSManagedObject.

Now you don't need to generate any code.

You can just use the standard class that we provide.

But we actually do want to encourage you to start using ManagedObjects subclasses.

Why? First, by moving away from the KVC access patterns, you gain a lot more support from the compiler to tell you when you're accessing something incorrectly.

It's better to use a more direct set name accessor method, rather than to set value for key, and you might mistype name or something like that.

This improves your code readability.

And it's also faster at execution time, because KVC doesn't have to do the dereferencing of the method, of the state that you want to access.

Now when I say, please generate subclasses, I don't mean write a lot of code.

All you have to do is the modeling tool allows you to generate the stub for the class.

And it's really just a declaration of the properties, so that the compiler can check them when you're accessing them.

Here we have three properties up on the screen, a first name, a simple string property, to one relationship manager, and a to-many relationship direct reports.

Remember that to-many relationships are represented as sets within Core Data.

You don't even have to write the code for the implementation for these properties.

If you use the @dynamic compiler directive, we inject the code at run time, the actual implementations of these methods for you, so you have; it's all there for you, you don't have to write the accessors.

Another thing we've seen people have questions around in the Core Data forums is, once you do the initial implementation, the initial generation of your classes, what happens when you have additional properties that you want to add to your entity?

Do you need to regenerate your whole classes and stomp over what you had before?

The answer is no.

Please be aware that the modeling tool has this menu items, design data model, and then the 4 items where if you select a particular property in your model, we generate either the implementation or the header declaration for that property, and leave it on the pasteboard.

Then you can go into your specific file that you've pregenerated and paste the appropriate piece of code that you have.

So you don't have to stomp, you don't have to stomp over your preexisting file.

A couple of more tips for working with Managed Objects.

Remember that you are inheriting from NSManagedObject and NSObject.

So be careful not to use property names that might conflict with properties that you're inheriting from these classes, such as Description and Deleted.

Those are the two most common ones that we see people have conflicts with.

And also remember that, because these are KVC compliant classes, it's not enough to avoid the property deleted, but also isDeleted, getDeleted, and setDeleted.

So all the KVC resolutions for a particular name.

So the initial generation of your object is pretty straight forward, you declare the properties.

But now let's start writing some more interesting code.

A property, a type of property that's available to you with Core Data are transients.

Transient properties are declared in the model.

You get the benefit of change tracking at memory, during memory while Core Data is managing the objects.

But their state is not persisted to the Store File.

So you're independent from the source schema.

You're able to add transient properties without incurring a potential migration step, because you wouldn't be changing the store schema.

Here's two examples, I mean I walked you through 2 slides showing what people mostly do with transient properties.

The first is the simplest.

You're simply computing a full name value from a first name and a last name.

So the first name and the last name are stored in the database, but your application requires the display of this, it's a full name.

So you would write an accessor like the following: you would see if you've already precomputed that data by getting the primitive value of full name.

And if you don't have that value yet, you simply concatenate the first name and last name, and you return that.

So this is a transient property.

You now have access to the full name, even though full name does not exist in your database.

A more interesting use of transient properties, and we've got some of this questions in the lab yesterday is, to reference external resources, which you are building yourself within your core data instance.

So let's say that you have a particular type that Core Data doesn't handle, some document object.

You can declare a transient property document object that depends on a persisted property, persisted document path.

So you are using Core Data to store the string value of the path to access this resource.

And then you do whatever you need to do to fetch the actual object from that path, and you return that in your Core Data object.

So you're using a persisted property, so as a locator for something that you're going to go out into the file system and go load yourself in however mechanism you choose to use.

Another type of property that we have available are transformable attributes.

You can see the list of types that we support for the attributes in Core Data from integers though strings, booleans.

If there's a particular type there that you don't see that you want to use in your attributes.

You can always pick Transformable at the very bottom there.

A Transformable attribute is one that is handled by Core Data.

We are doing the storing of your data.

But we're converting it into an instance of NSData, where it's basically binary data that we're putting in the store file.

But that data to you behaves as whatever type you declare it to be.

So here, for example, we're declaring a property that's a NSColor, and we don't have NSColor support in Core Data.

So you do the declaration of your property.

Color is declared as a Transformable property.

And we do the right thing behind the scenes of archiving and unarchiving the color data for you.

But from your code you're dealing with colors.

When using Transformables we are using a custom, I'm sorry, a default class of NSValueTransformer.

When would you want to use your own subclass of a valid transformer and put it into your transformable class?

Core Data doesn't do any encryption of data.

So if you want to archive something out in binary format, and you want to add an encryption step, Subclassing ValueTransformer might be a good way to put that code.

You might not want the default behavior that we use with keyed archiving.

You might just want a straight write that bytes data directly to the string that you're writing out to.

And if you do that, make sure that you take into account endianess between your, the byte ordering of your data, because your data might be stored in one platform and fetched in a different platform.

So make sure that if you do, are writing your own transformer that you are taking into account the byte ordering issues.

The last thing I want to talk about Transformable Attributes is if you do Subclass NSValueTransformer this is the convention that Core Data uses when calling those 2 methods that you, the primary methods in ValueTransformer.

When we call Transform Value, we are going from your type to NSData.

And we're using the reverse transformation to go from data to your type.

So that's; people get this backwards some times.

There's an example on how we do this, the photo locations example, it's associated with this session if you go to the attendee website.

The last thing I want to touch on in this section is adapting the model to your access patterns.

So let's say that you have a problem of, you want to search on titles of books.

So your initial inclination might be to define an entity book with a simple title property that's a string.

And you want to search on titles.

So searching means contains, right, in database terms.

And you'll also want to strip out the diacritical marks and case sensitivity, so you do Contains with the brackets, DC that's the diacritical stripping.

It turns out that Contains can be pretty heavy, heavy weight.

This is fully ICU compliant, localization aware, basically regXmatching at the database level.

So it's a pretty hefty operation.

Not only that, you're doing it each time you're searching for each row in your database.

So if you have a large dataset you might not get the performance that you want.

Here's the, a secret for you guys.

Apple, we ourselves on the phone specifically, on mobile devices, we don't really do full text searching, full searching in some of the searchers that you're doing.

We're doing prefix matching.

So this is actually very good for most of the functionalities that you want.

So one trick that you can do is; first, well actually two tricks here.

You can take care of the normalization of the data, right when you're setting the title.

So you add a secondary attribute, normalize title, where you do all of the stripping of the diacritical marks and standardizing on a case.

And then you put that, you put your normalized title into your book instance, and then you also index that property.

You also index the normalized title property.

And now you can write a much more efficient predicate by searching on any normalized title that's greater than or equal to the prefix that your matching on, or less than the subsequent prefix.

For example, if your user is typing S T A R, star, you can use the following predicate to much quicker go and find taking advantage of the index, and you get back a much quicker result than if you're doing a regular contains predicate.

So we also show you how to do this with the direct property example, also associated with the website.

One final step here, you'll notice that in the previous slide we're now doing prefix matching on the title, but we're only matching on the first, on the beginning of the title, basically.

So what if you want to do the match on any word on the title?

One thing you can do is split all the words in your title into a separate title words entity.

And it's one-to-many relationship.

Do the prenormalization.

So you're storing the title words already normalized.

You're still using the same predicate as before.

But now the predicate is searching on title words.

So you're matching on any word in the title.

And whenever you find a match on the title word, you can just use the relationship back to point to the book that it belongs to.

Okay? So, this has been an example of how, what I meant by, see determine what the actual usage pattern of your application is.

Where is the bottleneck?

Where do you need the fastest turn around time, and adapt a model to do that.

Let's move on to talk about the Managed Object Life Cycle.

Remember Managed Objects are the instances of your data that we're providing for you.

They don't exist in a vacuum.

They always exist with regards to the context that is managing them.

So anytime we talk about the life cycle of an object, it's always something that's happening with regards to that context.

So we're inserting a context where, I'm sorry, we're inserting an object, we're inserting it into the context.

If we're fetching an object, we're fetching it from the store into the context.

As we're updating our data, setting state on these objects, the context is maintaining track of what's going on, so that if we ask it to undo a change, the context is doing that undo.

When we decide to clean up, there can be at a very high level two kinds of clean up; either a direct deletion that you are asking us to perform, because you're messaging the context, please delete this object.

Or there's more memory level deletion where we realize that there's no more references to your instances, so we turn them back into false.

A thinner shell of those objects, that doesn't use the full memory footprint.

So what are your options for hooking into this lifecycle of an object?

You have three high level ones.

First is, if you're dealing with per instance actions your best bet is probably to override methods from NSManageObject, and I'll be talking about those.

If you want to react to graph level changes, you probably want to register with the Manage Object Context and listen for the notifications that it's posting and react to those notifications.

And thirdly, you remember that as you're asking the context to perform certain things, some of those methods return errors.

So make sure that you're inspecting those return values and reacting to whatever the context is telling you.

So let's go one by one in these.

If you're going to override methods in NSManagedObject, let's say that you want to add additional initialization code.

The awake methods, the awake set of methods that you see on the screen here is our good place to do that.

Let's walk through each one of them independently.

awakeFromInsert.

awakeFromInsert will be called once during the lifetime of your object when you do the initial insertion of that object into a context.

This is a good place to set baseline state for your object.

There will be times, remember that when you're creating the Managed Object Model we have text fields for each one of the properties that allow you to set default values for your properties.

There will be times when this is not enough for you.

For example, if you are creating employees, when you're doing the model you can't tell what the next employee ID is going to be.

So the awakeFromInsert is a good place to put that code where you're initializing the employee ID.

This is the awakeFromInsert from Managed Object; your own subclass that represents an employee, right?

So this is something you can't put in a model, so this is where you would put.

awakeFromFetch is very similar to awakeFromInsert, except that it's called each time that your data is fetched from the database and an object is created in a context.

What do you want to put here if you ever override this method?

This is another good place to initialize transient properties.

Some slides back I showed you the full name method.

That's sort of like an on demand initialization of your full name property.

But you could also choose to set that value up once you know that you have all of your database state and you're awaking from a fetch.

Finally, during the lifetime of your objects there will be situations that require us to revert state to your object from a snapshot.

But specifically if you're asking us to refresh the object or to do an undo, so there will be times when the object will be kind of reawakened with state that it had before.

You will be notified of this with the awakeFromSnapshotEvents method.

This is a good place to put code that might reset some of your cache data, so that you can compute it on demand again later on.

So now we move onto the objects where, I'm sorry, to the mechanisms that you have to react to states changes for the whole graph.

NSManagedObjectContextObjectsDidChangeNotification is how the context tells you that, of what's going on with your edits.

It's post it's notification that informs you, and a user info dictionary, you'll, give you the list of inserted objects, updated objects, deleted objects.

Please note that we're not telling you these are the changes that have already been saved to your store.

We're only telling you this is what's going to happen the next time you do a save.

Right? We've processed the changes in memory and this is what we know about, but you still have to do the save.

When do we; when will you be getting this notification?

When is it posted?

It's posted if you explicitly tell the context to process all the pending changes.

It will also be posted right before a save.

But it's also posted, very frequently, at the end of the event loop and anytime we're doing a fetch.

So you're getting this notification throughout the lifetime of your application.

Other notifications that are posted by the ManagedObjectContext, which are interesting for you to hook into.

When we actually do a save you will get the NSManagedObjectContextWillSaveNotification and DidSaveNotification.

Some of you have asked on the forums about time stamping your objects.

So you want to put a timestamp on the object right when it was last changed.

This is a good place to do that.

If you register to receive the WillSaveNotification, you know that your graph of objects that's being communicated to you is about to be saved.

You can set the timestamp for each one of those objects, and then we save it.

This is also a good place to set up relationships.

If you remember in the previous slides in the awake methods, those were meant mostly for single instance manipulation.

So by this point more of the objects are set up, so you can set up relationships between them.

Once the save does happen, you will get the DidSaveNotification.

This is a good place to put your code that needs to notify others in your application that the save has happened.

Let's say that you're managing multiple context, multiple peer context.

Once the save happens and you want another secondary context to know about that save, you can start that messaging from replying to this notification.

How would you do that?

Let's say that you have more than one context in your application.

You do a save in one context and you want to communicate all of those edits that just happened onto another context.

We have a method, mergeChangesFromContextDidSaveNotification, which the notification you're sending, by the way, is the one you got in the DidSaveNotification.

You simply get that, take that notification, and then you send this message to the other contexts that you want to have the exact same changes that were just saved, and we do the merging for you.

Now while we're in the topic of saving, don't forget that the save method returns a boolean, and it also has an error parameter.

So what kinds of things can go wrong when a save happens?

The first is validation errors.

Remember that as you were defining your model, you're able to declare certain boundary conditions about the maximum and minimum value of your properties, or optionality of certain values or not.

So if there's any validation issues that we detect while we're doing the save, we will fail the save and communicate this back to you in the error parameter.

If there's more than one validation issue, they will be chained in the NSDetailedErrorsKey.

So be sure to inspect the User Info Dictionary inside of each error and see if there's more than one validation problem.

The second type of issue that can cause the save to fail are optimistic locking failures.

This is a mechanism that we use to detect multi-writer conflicts.

When we're doing a fetch from the store we keep around a snapshot of the last value that we saw for a specific property in the store.

So if anybody else changes that value underneath us in the store, now that's any, that somebody else could be you.

It could be another thread in your application.

It could be another peer context that you're managing.

It could be another application that's still dealing with the same store.

Whatever it is, there's a change in the store, you tell us to save, and we detect that somebody's changed the value underneath us, so we will raise an error and we will notify you.

The default policy that Core Data has is to raise an error when this happens.

If this is not what you want to do, you can change the merge policy on the Managed Object Context.

And you have here the set of policies that you can set.

If you want us to try to merge the changes from the store and from changes that you have in memory, you can use one of the, the second or the third policy.

And with each one of those you're telling us who do we give priority to when we detect a conflict on a property-by-property level.

Do we take state from the store and stomp your memory state, or do we state from the object and stomp your store state?

Or do you just want us to take the whole object from memory and override whatever was in the store?

Or do you want us to take whatever was in the store and override whatever was in memory?

So, you get to pick whatever policy that you want.

But please be aware that our default policy is just to say this save doesn't work, because somebody changed the data underneath.

Cleaning up.

Like I said at the intro to this section, there is two types of clean up, one is deletion.

You're telling the context delete object, or delete.

Remember that the delete doesn't happen until you do the subsequent save.

So you can tell a context, please delete this object, but it's just marked for deletion, it's not actually deleted until you do the save.

When you do the save you will be notified that the save happened with the WillSave and DidSave notifications.

By the time you're getting those notifications, we've already done the delete propagation in your graph.

So if you, you can't access the relationships in your objects within those notifications.

So if you want to keep around; if you want to kind of take notes about what's going to be deleted for you to do secondary processing.

For example, if you're managing your own resources, like that document path example I showed with the transient properties.

A good way to plug that code is in the prepareForDeletion method.

This is something you would override in your NSManagedObject Subclass.

So this is when we're telling you, hey, we're going to eventually delete this object.

This is where you would say, oh, this object is going away and I'm managing an external reference for this object myself.

So let me just keep track of this path that will eventually be deleted.

So that when I'm notified that the deletion actually happened, I will go ahead and remove the data.

The second kind of clean up that can happen is kind of memory level cleanup.

You're not telling us to delete objects; we're simply cleaning up, because we don't see that you have referenced this to them.

So please don't override the dealloc method.

We don't quite guarantee at what point that's going to be called.

The equivalent for you Core Data developers should be they will turn into fault.

That's where you, that's the equivalent of dealloc for you guys.

This is where you want to clear out your caches or any dependencies that you've registered for for a key value serving.

Now turning something back into a fault happens when either Core Data detects that we don't have any references and nobody has references to those objects, or you're explicitly telling us to refresh an object by calling the following two methods on the ManagedObjectContext.

You're telling us to refresh the object, or you're telling us to reset the context and turn everything back into a fault.

Please don't call the refreshObject method and tell us to ignore the merging of the changes when you have a dirty object, because the consistency of your graph will get out of sync.

The final thing I want to talk to you about is multithreading with Core Data.

Most of the time most likely you will be considering introducing multithreading into your application to improve the UI responsiveness of your application.

So you want to push; you want to make Core Data applications, operations asynchronous by pushing them into a background thread, so that your UI is free to continue to interact with the user.

Be aware that there's always pitfalls when working with multithreaded applications.

It doesn't come for free.

So just because you spawn off numerous threads, all of these have a little bit of a cost as you're doing the context switching back and forth.

Make sure that all of, even though you have multiple threads executing, they're not all contending with the same resource, which defeats the purpose.

And you're also introducing a little bit of complexity into your application, specifically with the debugging.

So this is not a free solution.

But you do decide to go down this path, so the golden rule that we want you to always remember is to give each managed object, each thread its own Managed Object Context.

And I quote thread here, because you could be using Grand Central Dispatch.

Basically each concurrent unit of execution to get its own Managed Object Context.

Managed Objects are not thread safe.

You can't pass them around threads and expect them to work properly.

What is thread safe is the objectIDs that each one of those have.

So let me illustrate this.

You have a UI thread interacting with the user and the background fetching.

The background thread is doing the fetching of three objects.

You're ready.

You've warmed up the application.

You don't just pass those instances over into the main thread.

You actually take the object IDs from this object that you fetched.

You pass that across the thread boundary.

And then you use the method on the context, such as objectWithID where we construct a local copy of that object.

Now fear not, you're not doing a fetch from scratch here.

Because the background thread was already warming up the role caches that Core Data uses to create the objects.

So creating an object in the first context is actually very, very quick.

You are taking advantage of the work that you're doing with the background fetching.

If your background threads are inserting new objects, please remember to first save them to the store before you pass the object ID across the thread boundary and ask us to fetch it.

When you create an object, we pass a temporary ID.

You can't pass a temporary ID to another context and expect it to find it unless it's been saved.

So once you do the save the ID becomes permanent and we can get it from the other context.

If you're doing this with Grand Central Dispatch, the pattern is the same.

Let's say that you have a serial queue.

You know that blocks inside of a serial queue will execute serially by definition.

So all of your blocks can potentially share the same context, instance of context 1 here.

But, and this is a very important but, just because we have serial queues doesn't mean that you don't have concurrency.

Right? You might have more than serial queue executing at the same time.

So blocks within different serial queues could potentially execute concurrently.

So make sure that if you have more than one serial queue, the blocks in that second queue are using a different instance of a context from the blocks in the first serial queue.

And that's how you maintain the golden rule.

Of course if you're using a concurrent queue, you know that by definition block 4 and block 6 could potentially execute concurrently, so you take care of giving them different instances of a context to manipulate.

The last thing I want to talk about in this section is what happens when you're doing edits to your data in multiple threads?

Well, what happens is that you better know what you're doing.

I mean, a lot of this has to do with doing, defining a work, a good workflow in your application.

Okay? This isn't so much Core Data's problem, it's what does it mean for somebody to be editing an object that was deleted in the background, or vise versa, right?

So a lot of that work is you guy's work.

Well once you figure out what it means in your application, the two mechanisms that Core Data gives you, and we've seen these methods before is, first if you want to refresh the state of an object, turn it back into a fault, you do the refreshObject with mergeChanges method.

Or if you did a lot of processing in a background thread, in a background context, and now you want to push all of those changes over into another context in another thread, remember that I mentioned the mergeChangesFromContextDidSaveNotification.

So you're passing all of the objects.

Here, this is the one place where objects are crossing the thread boundary.

But because this is being handled by Core Data, we take care of doing the right thing behind the scene so that nothing goes wrong.

So this is what happens with multi-party edits and deletes.

And now I'll pass it onto Adam to conclude the session.

[ Applause ]

Adam: Thank you Miguel.

And now I'd like to dig a little deeper into fetching and performance.

So you know how critical performance is to providing a great user experience for your application.

You want your user interface to stay responsive, even as you scale to dealing with a lot of data.

And the two key strategies for achieving these performance goals are limiting memory usage by only fetching the data that you actually are going to show in your user interface, and amortizing your data base I/O by fetching in batches.

So keep in mind fetching is performing disk I/O.

So you will want to avoid the extremes of fetching too much data all at once, and on the other hand, frequently fetching a little bit of data and repeatedly calling out to I/O.

You want to find that right middle balance where you're fetching your data in objects and reasonable batches.

And we can do that by leveraging the strength of the database to do as much work as possible at the database layer.

So we can use predicates and sort descriptors to work across your entire data set at the database level and keep memory and I/O under control.

So I want to walk you through a few examples of how you can use predicates to do the work at the database level, and keep your memory and I/O needs low.

Let's start with an example of how you can avoid fetching objects from a to-many relationship, when all you really want to know is how many objects are in the relationship.

You can use the account expression to avoid fetching those objects, when all you want to know is how many there are.

For example, if you have a list of music playlists and you want to find all the playlists without any songs, you can use the @count expression, and a predicate like this to look up those playlists without any songs.

And you won't be fetching any of the song data back, you're just fetching the playlist that match that query.

If you want to work with the attribute value from objects related through a to-many relationship, you can use a SUBQUERY expression to access the attributes owned by the objects in the to-many relationships.

And this gives you a powerful way to test those attributes without, again, fetching of any of the objects from the relationship.

So in this example we want to fetch all the artists with songs longer than 10 minutes.

And we do that with a SUBQUERY expression that takes the songs, the name of the relationship as its first argument, and then tests if the song length is greater than 10 minutes.

If the results of that SUBQUERY, if the songs returned by that SUBQUERY, affect the count of the songs that are returned by that SUBQUERY are greater than zero, then we're going to return that artist in the fetch.

You can also work with and fetch attribute values directly.

So if you're only interested in fetching back unique attributes from one of your entities, you can fetch back those unique attributes as read only dictionaries, only fetching back the attribute value without anything else.

You're evaluating that work in the database, and only fetching back the results you're interested in.

So let's look at an example where we want to fetch all of the unique album names.

So we tell the request we want it to return distinct results, we want the results returned as dictionaries, and all we want is the unique names from our album entity.

And when you execute this fetch, you'll get back an array of dictionaries with all of those names in it.

You can go even further working with attribute values directly in the database by calculating and evaluating aggregate data on those attributes and returning dictionaries without fetching all of those objects into memory.

This is a powerful technique for performing a lot of work at the database level.

So let's look at an example where we want to calculate the total length of all of the songs in our music library.

So the first thing we need to do is create an expression that represents the function we want to evaluate.

And in this case we want to take the sum of the length of all of our songs.

Then we need to wrap that expression in an expression description that tells our fetch how to encode that information back in the dictionaries that we're going to be returning.

And in this case we want the results back as a double with the name totalTime.

The last thing we need to do is we need to configure our fetch request to perform the fetch on the song entity, only search for the property that we've constructed here, which is the function to calculate the sum of the length, and return those results as dictionaries.

Again, we're doing an incredible amount of work at the database level, and only fetching back the single result we're interested in.

Sometimes you just want to know how many objects are going to be returned by a fetch.

Either to display that number on screen, or to make space for the objects that you're going to be fetching back later.

And you can take any fetch request and ask the context for the count for that fetch request to get that value.

So in this case I'm showing a table that lists playlist names, and we can use countForFetchRequest to look up the number of songs for each playlist.

But then we can go a little bit further with working in the database by using a sort descriptor and setting a fetch limit to fetch the first three songs.

So now alongside the number of songs that we've got, we can show our users a preview of the first three songs in each playlist.

It's kind of improves the user experience, but you're only fetching back a little bit more data, even though you've got a lot of data under the hood.

So now let's take a closer look at what you're doing when your fetching managed objects.

There are a lot of options available to you on a fetch request for how you're fetching objects and what you're actually getting back.

In the case that you're fetching objects that you want to use in your working set, and you want to access the attribute data right away, you want to fetch back fully faulted managed objects with all of their attribute values pre-populated.

But you're not going to have to fetch back all of the relationships, even though they're fully faulted managed objects, the relationships are still represented as faults.

So you're not paying the memory cost for traversing too many relationships.

To get back fully faulted managed objects, you need to tell your fetch request that you want to ReturnObjectsAsFaults: NO.

So why would you want to fetch back faults?

Well faults are a very useful tool.

They're a very lightweight placeholder for managed objects.

And their attributes are fetched on demand.

And when you turn a fault into a managed object, and it fetches its attribute values on demand, it doesn't change its pointer address, so you're still working with the same object in memory, so you can keep it in the array that you had before; where it used to be lightweight, now it contains all of the information from the managed object.

And there's also a middle ground called partial faults, where you can fetch faults, but specify that you want to prefetch or you want to specify that faults should include some subset of the properties from your managed objects.

So if we wanted to show a listing of song titles, but we didn't want to fetch another heavier weight attribute from the song entity, we could tell the fetch request that we want properties to fetch to include only the title.

And the smallest representation for a Managed Object is the Managed Object ID.

These things are really small.

Each Managed Object ID is only 16 bytes.

So it's actually possible to work with a large set of Managed Objects ID's that represent the Managed Objects without taking up a lot of space.

As Miguel mentioned before, the Managed Object IDs are also thread safe, so it's a great way to pass that information between different threads.

They're also perfectly suited for using in predicates.

So any time you've got a predicate where you would be supplying a Managed Object in the predicate, you can supply a Managed Object ID, and Core Data handles it just fine.

To get back Managed Object ID's you need to tell your fetch request that you want the Managed Object ID result type.

And then you want to tell the request not to include the property values.

And you might be thinking, wait a minute, I'm fetching back Object ID's, they don't have property values.

But by default, the fetch request assumes that if you're fetching back Managed Object ID's, then you're probably going to want to use those Managed Object values some time in the future.

So even though the Managed Object ID doesn't store the property values, the property values are fetched into the row cache, so if you later look up the Managed Object for, or create a Managed Object for that Managed Object ID, it doesn't need to do a fetch from the database to get those values.

But if you really want to work with a large number of Managed Object ID's and minimize the amount of memory you're using, you want to make sure to tell the request not to include the property values.

So back to talking about fully faulted managed objects.

If you're not just working with the attribute values in your working set, if you're not just displaying the attribute values in your working set, but you also want to display related values on screen right now, then you want to prefetch that relationship, so that as you're displaying, all of the managed objects in your working set, you're not having to execute an individual fetch to fault in those relationships, which, getting back to amortizing a database I/O, is incurring a round trip to do a fetch every single time.

So you want to take advantage of prefetching to get those related values ready for your working set of data.

And I'll show you an example of how to do this.

If you want to show a list of playlist songs, and alongside the song show the album name for the song.

You can tell the request that you want to set the relationship keypads for prefetching to include the album relationship.

So I've talked about a number of techniques you can use to keep your memory usage low and amortize your I/O.

But what about the times where you can't control the access pattern?

What about when you're trying to work with some sort of API that it takes the entire array of objects that you want to fetch?

How can you batch; how can you fetch your objects in batches, when you're handing over the entire array?

You might think that you're either handing over an array of faults, in which case they'll be fetched one at a time and hit that frequent fetching pattern.

Or you're fetching everything all at once and handing it over.

In which case you're doing that big upfront fetch that you wanted to avoid.

Well the fetch request can do this for you automatically.

You can set the batch size, and when you execute your fetch request, it will return an array subclass that's configured to automatically fetch your objects in batches as they're accessed.

And the way you do that is you tell your request, set the fetch batch size to the size you want.

So I hope you've gotten some good ideas about things you can do to improve your performance, your fetching performance with Core Data.

But before you dig in and start making changes to your code, I want you to use the tools that are available to you to focus your efforts.

The Core Data instruments and instruments can pinpoint exactly where in your code you're hitting those fetching and faulting hotspots, so you can use your efforts in the spots where you need to put the time.

And I also encourage you to absolutely take a look at the header files and class documentation for NSFetchRequest and its expression the Predicate Programming Guide.

And make use of the developer forums as well.

There's also, just searching on the net, you can come up with all kinds of great information and resources.

So I'm going to wrap up this session today by looking at the topic of migration.

First of all, why do you need to bother with migration?

Well, think back to the beginning of this session where Miguel was describing that the data model is your contract with Core Data that describes how your data will be saved, and structured, and accessed.

So any time you go to make a change to your data model, that's going to change how that data is saved and accessed.

So if you want access to your old data, you need to adapt that old data to a new structure, and you do that with migration.

In Leopard we introduced versioning and migration using a custom mapping model that you could hand construct with flexible logic to translate objects from your old data model to objects in your new data model, in memory, by fetching data from your old store with the old data model, transforming them with the mapping model, and then saving them to the new store with the new data model.

In Snow Leopard and in iPhone OS 3 we introduced lightweight migration.

And lightweight migration works by looking at your old data model and your new data model, analyzing the differences, and inferring a mapping model automatically to translate data from your old format to the new one.

As an enormous huge benefit to this, lightweight migration is able to perform this migration entirely in the database using nothing but SQL.

So what kind of changes are supported with lightweight migration?

Well you can add, or remove, or rename just about anything: attributes, relationships, entities, all supported.

You can also change the numerical type of attributes, so you can change an int to a float.

You can promote a relationship from a to-many, or from, excuse me, you can promote a relationship from a to-one to a to-many, and preserve the related objects in the new data model.

You can't go the other direction, however, because from a to-many to a to-one, there's no way to infer which objects should be saved and which ones to let go of.

You can even make changes to the entity inheritance hierarchy.

So you can add a child entity, or a new parent, or you can even take two peer entities, create a common new parent, and move properties up from each of the child entities into the new parent, and into migration.

All the data from those entities will be preserved.

So what do you have to do to take advantage of lightweight migration?

First, you need to make sure you keep the old data models.

We need this for two reasons.

We need the old data model, so that we can compare the old model to the new one to infer the changes.

Second, we can't read the old data without the old data model.

So before you go to make any changes, go to the Design menu in Xcode, choose Data Model, add Model Version, and start making changes on the new one.

The second thing you need to do is set the options when you load your persistent store.

Set the migration options when you load your persistent store.

That's the migrate persistent stores automatically option and the infer mapping model automatically option.

Now if you've skipped over step one you'll see an error like this, Cocoa error 134130, "Can't find model for source store."

And that means Core Data couldn't find your source model, so we can't do the migration.

So I said you can rename just about anything.

And I meant it, but you have to give us a hint.

You have to give us a hint in your data model, so that we can tell when you're renaming something, as opposed to when you've deleted one attribute and added a new one.

So I'll show you an example of how this works.

You need to set the renaming identifier, and we'll do that here to change a song's name, and our old model to its title in our new model.

So you can see I've got the Xcode data modeling design tool here, and I'm looking at the song title attribute.

And this is version 2 of our data model.

And I've highlighted where the naming identifier appears in the inspector.

So all we need to do to preserve the data that used to be stored as the song name in our new model as the song title, is put name in as the renaming identifier.

A couple of tips to keep in mind when you're dealing with lightweight migration, changing a transient attribute to a persistent attribute is the same to lightweight migration is adding a new one.

A transient doesn't exist in the persistent store.

So all of the same rules apply as when you're creating a new attribute, that it needs to be optional or have a default value.

Or for a new relationship it must be optional.

There's a lot more information available about migration in the Core Data Model Versioning and Data Migration Programming Guide.

Covers both lightweight migration and the custom mapping style of migration.

Before I let you go there's one more thing I wanted to highlight.

This is a technique that's incredibly useful for adding back some of that custom flexibility that you might miss, but doing it in lightweight migration.

And the way it works is with a post-processing step that you use after migration.

So after you; the way it works is you open your store with the migration options, check the metadata for a custom key that you've chosen, like DonePostProcessing.

If the key isn't set, then you do your post-processing to populate derived attributes, or insert or delete objects that you want present or removed from your second, your new data model.

And then set the store metadata, so that you don't wind up post-processing again.

Then save the changes in metadata and you're good to go.

Now I'll show you a code sample to see, so you can see exactly how this works.

First we open the store with the migration options enabled.

Then we check the store metadata for our custom key, DonePostProcessing.

And we check to see if the value for DonePostProcessing is less than 2.

If it's less than 2, then it's time for us to update our normalized titles for books.

So we go ahead and do that work to populate the derived attributes.

And then we make a copy of the metadata to update our custom key, but preserve the other keys in the metadata, and set it back on the store.

And finally we save.

A really useful technique for adding back some of that flexibility that you get with custom migrations in a lightweight migration form.

So I hope this session has given you some ideas on how, the many different ways that you can use Core Data to mature your application.

And I want to stress that you want to invest the time to come up with a good initial model for your data.

And then Core Data will help you out, as you need to adopt your application with evolving access patterns and incremental changes over time.

And if you do find yourself wanting for some feature or encountering a bug, please use the bugreport.apple.com website to report those to us.

We read them.

And the more information you can provide to help us understand and reproduce the problem you see, or feature you'd like, the better chance it has to be dealt with quickly.

For more information please contact Michael Jurewitz, our Developer Tools Evangelist, jurewitz@apple.com.

And take a look at the Core Data documentation.

There's great programming guides, examples, and tutorials, and they're always being updated.

And also take a look at the Apple Developer Forums.

And if you want even more focus on performance on iPhone applications, come to tomorrow's session at 4:30 where Melissa will talk about Optimizing Core Data Performance on iPhone OS in Presidio.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US