Introducing Create ML

Session 703 WWDC 2018

Create ML is a new framework designed to help you easily build machine learning models using Swift and Xcode. Designed for Simplicity and Performance. Learn how you can build customized models from data that will enable new and powerful features in your apps using Create ML.

[ Music ]

[ Applause ]

Hello, welcome everyone.

My name is Gaurav and today we are going to talk about machine learning.

Last year we launched Core ML.

And the response from developers, from you guys, have been tremendous.

We are just amazed by the apps you have made.

It clearly really is phenomenal.

So let me first begin by saying thank you.

Thank you for embracing Core ML and we love seeing so many of you using it and giving you using it and giving intelligent features to our users.

We are in this together.

Thank you.

[ Applause ]

It is applause to the developers.

OK, so if you recall, Core ML gives you an easy way to integrate an ML model in the app.

The idea is very simple.

You get an ML model, you drag and drop an Xcode and with just three lines of code you can run state of the art ML model with millions of parameters and billions of calculations in real time.

That's just amazing.

And you give your users get real time machine learning as well privacy friendly machine learning.

All you have to do is to drag and drop and ML model in Xcode.

And Core ML takes care of the rest.

I think the big question remains is where do I get these models from?

So last year we provided you two So last year we provided you two options.

The first one was you could download some of these models, popular models, from our website.

But more importantly we also released Core ML Tools.

Core ML Tools allow you to tap the work which is done by amazing ML Community.

So the idea is again simple.

You choose your favorite training library, create your model in that training library, covert it into Core ML from there and then just integrate it into your app.

When we released Core ML we released with only five or six training libraries, support for five or six training library, but within a year we have a support for all the famous training libraries out there.

We are enhancing our tools to even allow you more customization and we are going to talk about more about Core ML to talk about more about Core ML tools in tomorrow's session.

Another thing we did towards the end of the year, we released to recreate our open source machine learning library.

We are going to talk about to recreate in tomorrow's session.

Bu this year we want to give you something even more.

We want to continue our journey.

We want to give you something native, something swiftly, something that harnesses the power of our Xcode, something that puts the focus on you, our developers, something that just demystify machine learning for you.

Hence we are introducing Create ML.

[ Applause ]

Our machine learning framework in Swift.

So Create ML completes the left had side of the equation.

The idea is you make the model in Create ML and you run it in in Create ML and you run it in Core ML.

You do complete end-to-end machine learning in Swift, our favorite language.

So you are not dealing with language oddities where you are training in one language and then running inference in another language.

Create ML is simple and very powerful.

It is tailored to your app.

It leverages core Apple technologies and you do everything on your Mac.

So for this year we are going to focus on three very important use cases.

The first one is images, second is text, and the third one is tabular data.

These are the top use cases that we believe will benefit you.

So you can do things like Custom Image Classifier.

Idea is that you make your own image classifier that can recognize product from your product catalog.

You can do things like text You can do things like text classifier so you can make your own sentiment analysis, topic analysis, domain analysis.

And you can also do classical regression and classification on tabular data.

For example, let's just say you want to predict the wine quality using its chemical composition.

The possibilities are endless and we are going to discuss them in detail in the next 30 minutes.

However, before we do let's take a look at common work flow.

First, let's just say you are trying to enable an experience in app.

Make sure that machine learning is the right thing to do there.

So don't just blindly apply machine learning.

Make sure machine learning is the right thing to do there and define a machine learning problem.

Second collect data.

Make sure this data reflects the real usage of your app.

So for example if you're making So for example if you're making a custom image classifier that is going to be used by users on their iPhone, so collect pictures from your iPhone.

Collect less screenshots but have more iPhone pictures.

Then you train your model.

Finally another important step here is to evaluate this model.

The model evaluation is done on a separate handout set.

If you're happy, you write out the ML model, but let's just say the results are not good.

You should either reclaim your model with different parameters or you collect more data.

Create ML actually helps you across all four stages of this work flow.

We have powerful in build data injection utilities DataSource and DataTable that we will talk in the remainder of the presentation.

You can actually train your You can actually train your model using only one line of code.

And the training is done hardware optimized.

There are built-in evaluation metrics so you don't have to write your own precision and recall and confusion metrics calculation, use them.

And then finally when you're happy just write out the model.

Now we will take a deeper look in all three-use cases Images, Text, and Table of Data.

So let's start with images and to do that I will invite Liza Ottens, Senior Engineer in Machine Learning team.

Thank you.

[ Applause ]

Thank you Gaurav.

Since enabling image based experiences are some of the most powerful and interactive ones that you can add to your apps, today we'll take a look at how to train custom image to train custom image classification models.

Image classification is the problem of identifying what label out of a set of categories you'd like to apply to an image.

Depending on the type of training data, you can target domains specific use cases to enable in your apps.

The first step is to collect training data.

In doing so we'll take a look at a fruit classifier and see how you would do so.

First you would want to gather many varied types of images that reflect the true data that you'll end up seeing or that your users will and then label them.

First you can do this as a dictionary with the string label corresponding to arrays of images or what we've noticed is many popular data sets are organized in hierarchical directory structures such that the label is the name of the folder that contains all images within it.

There are also other data sources such as single folders sources such as single folders that contain labeled file names and in the Create ML API we've provided conveniences to extract these structures.

Now training is the more complex part of the equation.

So once you have your data this is what you'll look at next.

And what you can do is you start training a very complex model from scratch on your input images.

And for this you need lots and lots of labeled data, you need big compute and you need a lot of patience.

But another well-established technique in the industry is transfer learning.

And since Apple has lots of experience in training complex machine learning models, we already have one in the operating system that you can take advantage of.

So what we do is we apply transfer learning on top of this model that already exists in the OS and we augment it, retraining the last few layers to your specific data.

So you no longer need millions of images.

You can train a good classifier You can train a good classifier using the amount of data that you have.

This results in faster training times.

And for developers that we've worked with we've seen them go from ours of training down to minutes for thousands of images or for small data sets even seconds.

This also results in much smaller models going from hundreds of megabytes down to just a few megabytes for thousands of images or even kilobytes for small models.

The goal with Create ML is to abstract much of this and make it simple and easy to use.

But to prove it let's take a look at a demo.

And there are two work flows that we can look at.

But first to set up the problem I've started by running an app that's using a state of the art image classification model that's already in the industry.

that's already in the industry.

This one though is quite large.

It's a 100 megabytes in our app and if we run it we have some fruits but it's not quite what I was looking for.

I'd really like it if instead we could classify these particular ones.

So what we can do is we can switch to a new playground and import Create MLUI and walk through how to do this using the UI for it.

We can define a builder, initialize it, and to enable drag and drop training we can show the builder in the live view.

This brings up a prompt in the live view to drag in images to begin training.

And here I've set aside some photos of fruits some blueberries and other types.

And you can drag them in and automatically an image automatically an image classifier model begins training on the Mac.

All of this is accelerated on the GPU on however many categories you end up training on.

It automatically tells you what the accuracy is on the training data set.

But what's more helpful is to try this on new images that the model hasn't seen before to predict how it will do on real use cases.

So I can drag in this other folder containing unseen images and now the model is evaluating on these new types of fruits.

And if you scroll, you can see what the true label is of each type, as well what the predicted one was by the model.

And if you're happy with this accuracy, what you can do is you can take the model and drag it into your app.

I'll add it here.

And if we take a look this model is 83 kilobytes.

That's a huge savings down from hundreds.

[ Applause ] So we can delete the old model that we were using before and in the view controller we can initialize this new one, Fruit Classifier, well Image Classifier.

We can then rerun the app, bring up the simulator, and see how it does on some of those fruits that we've not trained it on.

Well, different.

On the raspberry it can no correctly predict it since we trained the model to recognize raspberries.

We can even see if it can distinguish from strawberries and it can now depending on our data.

But there are other workflows you can use.

Perhaps you want to do this programmatically or perhaps you want to automate it.

We can also walk through how to use the API, Create ML to do so.

So now we can switch to another playground in import Create ML.

Since we'll be using URLs we also can import foundation.

And since on our desktop we still have these folders of fruits, we can say where they are, and also say where the testing fruits are.

And then the next step is to actually train the model.

So we can define a model and we can initialize an image classifier.

And now if we take a look at what autocomplete shows to us we can see, we can provide training data in the form of a dictionary of labels to arrays of images or we can use a data source or even specify model training parameters if we want to.

Let's use a data source and we'll use label directories since that's how our data is organized and specify the trainingDir Directory.

trainingDir Directory.

And since we're running in the new REPL mode of Xcode Playgrounds I just need to hit Shift Enter and the model begins training right away.

You can even pull up the console and see output of one of its extracting features and how many iterations it's running through.

Afterwards you can also open Quick Looks and see the name in the model and how many instances it's trained on.

Now we might want to evaluate on the testing data that we've set aside.

So what we can do is we can call evaluation on another data source since that folder is organized the same way specifying the URL the test data.

You can hit Shift Enter and now the model is evaluating on these testing datasets, testing images.

Once it's complete we can also look at the Quick Look and see how many examples it evaluated on, as well as how many classes were in that folder all together and the accuracy.

If we're happy with that we can write it out.

I'll grab a code snippet here.

And say that I want to write it to the desktop with name Fruit Classifier ML Model.

Once I do, you can see this new model up here is on the desktop.

We can double click it and take a look and see it's exactly the same.

This is also 83 kilobytes.

Furthermore, we can integrate it back into our app the same way.

Let's recap.

[ Applause ]

We saw two ways of training image classifier models in Create ML.

One was with the UI, which makes it super simple to drag and drop your training data and evaluation data to produce an ML Model.

The other way was with the API, The other way was with the API, the Create ML API using Xcode Playgrounds.

If we walk through some of this code we can see the first thing we had to do was import Create ML.

The next was to specify where our training and testing data was and then actually begin training the model by specifying how our training data was laid out.

We can then evaluate on the testing data and finally Save ML Model.

If you want to automate this you can also turn these into scripts, which is a very popular way of saving what you've done and rerunning it whenever.

You can then change permissions on the file and run them like so.

Or for other work flows you can always use Swift Command Line REPL.

So we've seen today how to train image classification models using a few different work flows.

Put next, I'd like to pass it off to Tao to talk about natural language.

Thank you.

[ Applause ] Thank you Liza.

Hello everyone, my name is Tao.

I'm an Engineer Head at Apple working on the Core ML Team.

You just saw how easy and intuitive to train an image classifier with just a few lines of code.

Now, I'm going to show you the same can be done for natural language.

In the easiest release we're going to support two Natural Language Tasks Text Classification and Word Tagging.

Today I'm going to focus on Text Classification.

For details on Word Tagging please join the Natural Language Session that happens tomorrow.

Text Classification can be used in a few machine learning applications.

For example, Sentiment Analysis.

The energy of developers is The energy of developers is amazing!

That's a positive note.

You want your app to know it.

[ Applause ]

Spam Analysis.

If you saw this message in your mailbox, you know it's very likely it's a Spam.

So you want your app to know that as well.

Topic Analysis.

The Warriors just had an amazing comeback win.

That's a sport post.

You want your app to be able to classify that.

So to train such a classifier, the first thing you do is to collect some training data.

With Create ML we support a few different ways for you to organize your training data.

For example, label directories.

Here you have two folders.

One named the positive, the other one named negative.

Within each folder you have a number of articles with just raw text whose true label is simply the name of the folder.

Alternatively, you can prepare your training data using simple CSV, where you prepare your raw text and the truth label separated by comma.

We also support JSON for the training data and know that we just talk about training data organization and you can actually organize your test data in the exact same way.

Now, with your training data and test data ready, what are the steps involved to train such a text classifier?

A typical work flow would look something like this.

You start with your raw text, you do a language identification to figure out which language it is in, you convert that into tokens, and then you convert that into some feature values and then you can apply a machine learning model that gives you some predicted value that you have to map to some desired label.

And then you can compare that And then you can compare that label to your truth label and start reiterating on it.

With Create ML though we took away all these complexities so that all you need to do is to prepare raw text with the true label and start training immediately.

[ Applause ]

Now let me give you a concrete example like how you can train such a classifier and use it.

For example, we have this simple App called Stay Positive, whose purpose is to encourage positive post.

If user entered "I hate traffic" the background turns red and we disable the post button.

[ Laughing ]

"I love driving my car at five miles per hour just chilling in traffic."

That's a positive post.

That's a positive post.

We encourage you to post it.

Just imagine what our Internet would look like this with this app running on everybody's phone.

[ Applause ]

Now in order to do that let me give you a live demo.

So to train such a classifier the first thing I do is collect some training data.

On my desktop I have a train folder and also a test folder.

In train folder we have two folders.

One is named positive, the other one negative.

And there are a number of articles in each folder and task folder is organized in a very similar way.

So the first thing I do is to import Create ML.

Now I need to tell the program Now I need to tell the program where to find my training data.

For that I'm simply using a URL capability and then I can start training my model using the label directories that Liza just showed you.

Looks like the training has started.

As you can see on the bottom there there's some programs report for you to check.

It looks like the training has finished.

Now you can check some basic performance numbers on this model.

For example, model.trainingmatrics, now it shows you this model has been trained on over 2,000 examples and the accuracy is 100%.

But how does it perform on some unseen data?

So I'm going to do the same to define test data and then evaluate that model on the test data.

As you can see we have 77 test examples and we are achieving over 94% accuracy, which is very good.

I'm sure you want to reiterate on it if you want to see like even higher number, but this number is pretty good enough for my app.

So let me just test it out.

So to save out the model what I need to do is define a URL where it's saving to and then write out the model to my desktop.

It looks like that model has been saved.

So now I need to switch back to my app.

Just drag and drop it.

There you go.

Now I can start use it.

I will do let model equal to TextClassifier, which should up complete.

complete.

And then I'm going to insert some basic inference code.

In this inference code as you see the first like I do is using model.prediction to get prediction and then in order to hook up with this simple App UI, I just convert that it into some double value.

Let's give it a try.

Yeah, let's try some example we have showed you.

"I hate traffic".

Negative.

"I love driving my car at five mile per hour just chilling in traffic."

Positive. Let's try something different.

That will be fun.

"Machine learning is hard, Create ML makes it so easy."

Create ML makes it so easy."

Positive.

[ Applause ]

So that's how you train your customized text classifier and drag it into your app to use it.

Here's the recap.

So to train such a classifier the first thing you would do is to specify data.

You specify training data as well as your test data.

And then you can create your model on the training data to evaluate these performance you evaluate a model on the test data.

Finally to use your model in your app you simply save it out using this right API.

To summarize, with just a few lines of code you can train your lines of code you can train your customized text classifier simple intuitive.

With that, I'd like to hand back to Gaurav who is going to talk about tabula data.

Thank you.

[ Applause ]

Thank you Tau.

Besides images and texts, another common source of data that requires very frequently when you're solving a machine learning problem is tabular data.

What I mean by tabular data I mean the data is spread sheet format or in a table format.

This kind of data occurs fairly frequently.

For example, let's just you're trying to predict house prices using number of beds, number of baths, or square footage, generally the data is arranged in a tabular format.

You want to predict the quality of wine using its chemical compositions.

Chances are data will be arranged in table format.

Or something even simple where Or something even simple where to hop which bar to hop tonight using happy hour or price the data will be in tabular format.

To handle the data, which is in tabular format, we actually introduce a new data structure, which we call as MLDataTable.

MLDataTable is based on [inaudible] technology that we will discuss in detail tomorrow.

There's something interesting about these data tables.

The rows contain the observations, or examples.

So here house number two has four bed, three bath, and 500K price.

The columns contains what we call as features.

So the beds are features, baths are features, square feet, etcetera, are features.

There is one spatial column that we want to predict, in this case price, and this feature in this column is known as target or column is known as target or response variable.

The whole idea behind tabular data is that we want to predict target variable as a function of one or many of these features.

So what are the common sources that we support?

Well, CSV, JSON, as well you can actually have code.

So let's take a little bit more about MLDataTable.

First you can read data simply by using CSV.

What is more important that you can access the column using a subscript notation.

So all you do is house at a price and you get an entire of price.

You can add to column, subtract to column, multiply to column, divide to columns.

And the way you do it is in very natural looking syntax.

So you just simply say house data price divided by house data square feet to get price per square feet.

Behind the scenes this calculation is done using evaluation and through vector operations.

You can also do some of the other interesting things, for example you can split data table in training intercept as well as you can even do a filtering.

So for example, if you're only interested in large houses, you can clear and indicate a variable and filter it out.

There are a lot of operations that data table support.

I urge you to try it out in Xcode Playground.

They're fun.

Now once we have data in data table you will like to do the training on it.

Create ML supports large number of algorithms such as Boosted Tree, Regularized Regression, Random Forest, etcetera.

And all of these algorithms are represented by their class.

In order to train your model you only have to write one line of code.

Basically you tell what is the target and where you are getting target and where you are getting the data and which is the algorithm you are instantiating.

So in this case, let's just say you are running Linear Regression, or Regularized Linear Regression, you just actually tell it that the data is house data and the column is price.

If you do Boosted Tree Regression, just replace Linear Regression with Boosted Tree and you're all set.

Now Random Forest like that.

Plus we also provide a high level abstraction ML Regression that automatically runs all these algorithms and choose the best one for you.

[ Applause ]

This is inline with our philosophy that you should focus on task.

And the task is to predict the price.

You should not focus about nitty-gritty details of the algorithm.

Having said that in case you are expert you can actually use Booster Tree and change its Booster Tree and change its parameters also.

So a complete end-to-end would look like this.

It follows exactly the same pattern as image and text.

First you specify the data.

Second you just create your model.

Third you evaluate the model.

And once you're happy you save it all.

So table of data, image data, and text data, they all follow the same pattern.

So let's just take a quick summary of what we saw in this session.

So Create ML is our ML framework in Swift.

It's very simple to use and it is very powerful and it leverages Core Apple Technologies.

You do end to end machine learning in Swift on your Mac.

We also discussed about our work flow.

Once again, you start from an experience.

What is the experience you're trying to enable?

Then define the problem.

Then collect the data, make sure this data is reflective of the real world usage of your scenario.

Then you train the model and then finally evaluate it.

And once you are happy you just save it out.

Create ML is in Swift and it's available on macOS Mojave.

You can use it in Xcode Playground, Swift Scripts, and Swift for Apple.

So please try it out.

We would love to hear from you.

We are here to receive your feedback and we hope that you will love it as much as we do.

We will be in Machine Learning Get Together as well as in Labs so there is tomorrow there's a Get Together.

We will be in Labs also.

So please give us your feedback.

There are also related sessions in the WWDC App.

We have Core ML session tomorrow morning and an ML Session tomorrow afternoon.

Vision session on Thursday and we have Labs on Wednesday and Friday.

Thank you.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US