Neural Networks and Accelerate

Session 715 WWDC 2016

The Accelerate framework gives you fast, energy efficient signal and image processing and linear algebra libraries. Learn about new libraries dedicated to high performance neural networks and numerical integration.

[ Music ]

Good afternoon, and welcome to the Accelerate Session.

My name is Eric Bainville, and I'm with the Core OS, Vector and Numerics group.

Our group provides performance libraries for CPU.

Performance libraries means we usually are [inaudible] can provide very compute intense functions like matrix products, [inaudible] transform, these kind of things.

Most, most of these functions are inside the Accelerate framework.

We find for example, image, which is an image processing library.

It will transform between types and also do geometric transformations on images.

Next to vImage you will find vDSP, which is more for signal processing, [inaudible] transforms, and other signal processing functions.

Then we have the BLAS, which is linear algebra library.

It's very old.

It comes straight from the 70's.

Few years ago we added Sparse, SparseBLAS, SparseBLAS computation on the vector and matrices.

We also provided LAPACK in the LinearAlgebra, which is higher level linear algebra library.

Outside of Accelerate, we also have a few libraries, like simd, which is a set of headers providing you direct access to vector instructions and vector types, and they are not directed to vector units in the CPU's [inaudible] without writing [inaudible] or assembly code.

We also have compression, which we introduced last year for lossless compression, and everything we provide is optimized for all the CPU's we support.

So when you get a new phone, we optimize a code for it so you don't have to do it.

Alright. Today, first, we start with a quick refresh on compression.

Then we'll introduce two new libraries inside Accelerate; BNNS, which is a set of low level compute functions for neural networks; and the Quadrature, which is a small library dedicated to numerical integration of functions; and then my colleague, Steve, will come on stage and introduce new additions to simd.

Well, first, before we start, let me tell you quickly how to use Accelerate.

So depending on your language of choice, you will have to import or include the, the declarations of the functions, and then you need to link with the Accelerate framework.

So from Xcode in your project settings you will navigate on your target, and then you click on the, on build phases, and as a window appears, you click on link binary with libraries.

You open that.

Alright. And those are the ones appear with the little plus here, and you will get a list of all the libraries and frameworks you can link with, and Accelerate is the first one.

[ Applause ]

Yes. Alright.

Well, if it's not the first one, there's a search bar on top of it.

So if you look for compression, you type compression, I knew we'd get compression.

Alright. So that's, now you're using Accelerate.

OK. Yes. Then compression, remember last year when Sebastian introduced the LZFSE's the platform state of the union where we still don't know who these guys are, but today we're announcing that LZFSE is open source.

So you will find it on the github, and we publish it under BSD license.

Just let me remind you why you will want to use that.

So these are the platform comparison between LZFSE and zlib compared to the same options.

So we are 1.4x faster for encode and 2.6x faster for decode.

Alright. That was our compression.

Now let's jump to BNNS.

Basic neural networks subroutines.

So the name looks really like BLAS.

BLAS means basic linear algebra subroutines.

As I said, it comes straight from the 70's, and BNNS provides a, a set of very low-level computation routines.

So I will [inaudible] later.

We do only low level computations like matrix products but dedicated to neural networks.

Before I enter into details of what it does and what the API is, let me remind you how neural network works.

Let's say we have this network, which is trained to recognize animals, OK.

So on input you will have an image, and then you have this big orange box.

Inside the orange box, the red box represents the, the weights.

These are the parameters of, of the network, and on output, this thing we output four values, which are the probability of having a cat, a dog, a giraffe, or a snake.

OK. So first you need to train it.

Let's say you have a cat image.

You process the cat image to the network, and you get an answer.

So that's the highest probability.

It says dog.

Well, that's what you need to train them.

So it was a cat, and what you do is that you [inaudible] the good answer, and what if I slightly modify the weights.

So the networks now will go slightly in the cat direction, and you need to do this millions of times with a lot of images, and at some point, you will have a trained network answering correctly for, for on every request.

So if it was a, a giraffe, and it says giraffe because we trained it properly.

Alright. So that's what the network does, and notice the difference between training and inference.

During inference, we didn't change the weight.

So usually let's say you have this as an app.

What will happen is that you will do the training offline when you build your app, you will do your training, and then when you shift the app, you will shift the weights and the, the network apology, and what happens on the device is on this inference part.

OK. What's inside this orange box, though?

So let me show you an example.

If you have been to the What's New in Metal, Part II speech yesterday, you have seen a bigger example of, that was a scene recognition network, and lots of demo [inaudible] recognition network, these are more advanced networks with hundreds of layers.

This is state of the art from five years ago, and this, this describes a, a [inaudible] recognition network.

So on input you have the small image, which is a capture of something written, and then it goes to different layers.

Here you have a 5 x 5 convolution, and the output will be a stack of five images.

That's an output of five different convolutions with different weights, and then you add another layer.

You take these five images, apply a lot of convolutions to them, and you will get another bigger stack of fifty images with 5 x 5 pixels.

So what's happening here is you go from the image space into the feature space one layer at a time.

So the content of this small images becomes more and more abstract, and at the end what you have is the feature which will tell you that's a zero, that's the one.

This is what you want on output.

OK. And so with a very small number of layers, you can really do that, and it works really well.

So after these convolutions, we take this 5 x 5 x 50 values, take them as a single big vector and apply a fully connected layer, which is just a big metrics product, and it will mix all these values together and produce a set of 100 values.

That's called the hidden layer in this specific model, and then you need a last one, which will make this 100 values, mix them together, and produce the ten outputs you want.

So at this point you are computing future space each value here is a probability of being one specific digit.

Alright. So that's what is inside the network, and as you have seen here, we have two different kinds of layers, and this is what we do inside of BNNS.

We provide the compute part of these layers.

OK. Before we start about what we really compute and what the API is, let me show you some numbers.

So this, this, here [inaudible] , Caffe, which is a well-known package for [inaudible] network computation, and this is a convolution part of Caffe.

So we have 14 different convolution sizes.

Here, you can read them.

And this is the time Caffe takes to, actually, that's the speed.

So higher is better.

This is Caffe on this convolutions, and this is what you get with BNNS.

So on average, it's 2.1x faster, and if you have even bigger convolutions, you can even reach almost four times faster.

Alright. So that was all the numbers.

Now let me tell you what is inside BNNS.

So that's a set of low level compute functions.

Very similar to BLAS.

That's why we named it BNNS.

It doesn't really know what's, what is a neural network.

That's, that's your side of the equation.

What it does is provide really only the compute part of it.

OK. And it only, only does inference.

Actually I'm not sure it would make sense to, to run training on the device.

That's, that's too expensive.

You need millions of images and million of computations.

That wouldn't fit.

So usually inference would be done offline, and as I told, you will just do the inference part on your, in your app.

And we provide three different types of layers.

Convolution layers, pooling layers, and the fully-connected layers.

Why? Well, actually in the, in the modern network, you spend more than 75 percent of the time computing convolutions, and then the next one on the list of pooling layers with something like 15 percent, and fully-connected layers also take a lot of time but usually you find them only at the end of the network, like, like we have seen in the example where we only two, two fully-connected layers at the end, but this did take a lot of time.

OK. Now that we know what is inside, I will enter into the details of the three different kind of layers and what we compute and how to create them through the API.

Let's start with the convolution layers.

So this is a convolution.

It takes an input image, a block of weight that the orange matrix on the middle, and each pixel in the output image is computed by taking a block of input pixels, multiply, multiplying them by the, the weights, and then you take the sum of that, and you get your upper pixel.

And you need to do that for every output pixel.

So if you count, that's a four dimensional loop because you need to loop on x, y, and then on the kernel dimensions.

Actually what they do is slightly more complex than that because what we have on input is just, is not one image.

We have a stack of images.

So we need to duplicate the weights and know what we compute is this convolution on each layer, and then we take the sum of all these guys, and what we, and we get, to get our output pixel.

So now the, the loop is five dimensional.

I added the IC [inaudible] of the equation, and really this is not what we compute because we also have a stack in output.

So what we really compute in this convolution is this.

We do this many times, one time for each output layer, and now we have a six-dimensional loop.

That means even if the dimensions are very small, like in this example, we are nothing bigger than 264.

It's very small, but when you multiply all these stuff together, what you get is billions of operations.

That means tens of milliseconds of compute, and in an entire network, which is much bigger, you can have trillions of, of floating point operations.

That means seconds of CPU time.

OK. A compute in the convolution layer, now how do you create that with BNNS?

Well, first thing you need to do is to describe your input stacks.

So you need to specify the dimensions of the image, the number of channels, and also all the layered in memory.

So the increment between two rows and the increment between two layers, and also a very important thing, what type is used to store them.

For example, we, we work on float 64-bit float.

On neural networks, we don't need all this precision, and usually people will use 64-bit float, and that's good because the storage is half of it.

So instead of having 20 megabytes, we have only 10, and if you can go on integer, eight-bit integer, you will have only five megabytes.

And you usually, usually don't use any precision in the output.

So you can specify the type used to store the input.

You need to do the same for the output stack, and then you need to describe the convolution itself.

That's the kernel dimensions, the padding which is a [inaudible] band of, of 0 added to the input, the stride, which is the increment of the loop in x and y, and the, again, you need to repeat the, the channel counts for input and output and specify the weights.

That's what the orange block in the middle, specify the weights, and, again, you can have a, a different storage type of, of the weights, and usually you want 16- or, or 8-bit storage for the weights.

Again, because it will lower the memory usage and the storage needed to store them.

Then once you have done that, you can create a convolution filter with this function.

You tell it, OK, this is my input stack, that's my output stack, that's my convolution create a filter, and you will get a filter object which then you can use to apply the convolution on your, on your data, and once you are done with it, you will call a [inaudible] filter to get rid of it and, and really the resource is used.

This was for the convolution layers.

Now, let's go and with the pooling layers.

Pooling is slightly simpler than convolutions.

So what to compute one output pixel what you do is you take a block of input pixels and take the max value of the average, and that's your result, and you do this for all pixels in all channels.

That's what the formula says.

Well, again, to create a, a filter for the pooling layer, you need to describe the input and output stacks the same way as before, and you also need to describe the, the pooling layer itself.

Again, with the kernel dimensions, padding, stride, and this time you don't have weights, but which function to use to compute the output.

That will be max average.

And then once you have done that, you can create the filter, and you get a filter object similar to the one we had before.

The last kind of layers we support is the fully connected layers.

Well, it's called fully connected but that's a matrix product hidden here.

So on input you have a vector, and then you will multiply it by the matrix, add some, add a vector of bias, and then you get the output.

So just a, a matrix product.

So this time, you don't have images in it for your vector.

So you need to describe the vector, that's its size.

Point out to the data and also the type you use to store these values, and, again, you can use 32, 16 bits, floating point, and integer.

And the you need to describe the layer itself through the dimensions of the matrix and the matrix co-efficients.

And also the bias is not on the slide, but you have a bias.

And then you can create the convolution filter, and you get, again, if [inaudible] to the others.

Now what we have on this filters is how do we apply them.

So you have your input data as your output data and your filter, and you will have, you have two functions to apply them.

So it's called filter apply if you have only one pair of input and output, and you have, if you have several of them, you will call filter apply batch, and you, you tell, we tell you the number of pairs you hvee and how to get from one pair to the next one.

That's a stride in memory.

Alright. And that's it for BNNS.

Let me wrap up.

So BNNS is a set of, of low-level compute functions for our neural networks.

Really low level.

We do the compute.

We do it well.

We do it fast, but it doesn't know what a neural network is.

It just does the compute.

So we optimize it to be fast and energy efficient, and important thing, it will under multiple data types.

OK, that was it for BNNS.

Now Quadrature.

We have a request, will, people who are asking us about library to do a numerical integration of functions.

So here it is.

Yeah, remember your, your school days.

So this is computing the integral of a function between a and b.

So that's a green area between the curve and the axis.

Well, so to do that, you first need to describe your function.

So you need to provide a callback.

Once thing we did, and which is different from your, the old library's doing the same thing is that the callbacks takes a number of points to evaluate.

Because usually when you compute the integral, you will have to evaluate the function in, at many points, and if you have a callback, a vectorized callback doing that faster, it's all good.

You can do much, much faster with this multiple x callback.

So it will, it will pass you a number of values in x, and you will have to fill it with a f of xi inside each value of y.

So that's your function, and then you need to tell it how to integrate it.

So we provide a set of three different integration methods with different complexity and, and time, and also some of them are able to integrate to infinity and, note, you will find details in the Quadrature header.

And you also need to specify what is the error you want on the output, and the max number of intervals we can subdivide maybe to, to complete the output.

And then you pass that to the integrate function, and you also tell it a, b, and you pass a point to receive the error.

It's called estimated error is here, and it will return you the estimated error on the output and also status, we receive the status of the computation because if you ask a very, very low error, sometimes it's not able to convert, and we will get that in a status.

And that's it for Quadrature.

And now let me call Steve on stage.

He will tell you about new additions to simd.

Thanks very much, Eric.

My name's Steven Canon.

I work in the Vector, Numerics group with Eric.

Eric took you back to calculus just then.

I'm going to take you ahead again a little bit to linear algebra right now.

We have this nice module, simd, which provides geometric operations and vector operations for C, Objective-C, C ++, and Swift.

And it really closely mirrors the Metal shading language.

It ties in closely with SceneKit and Model I/O and all those graphics libraries.

If you find yourself writing vector code to do, you know, small, you know, 3 x 3, 4 x 4 kind of linear algebra operations, this is the library you probably want to be using instead of writing that by hand.

We have most of what you want available.

If something's not there, ask us to provide it.

It's all really fast and there it is.

So what's there now?

We've have a bunch of types.

There's vectors of floats and of doubles, and we also have signed and unsigned integers like 2, 3, and 4.

And we have matrices of floats and doubles of those same sizes.

And this is just what's available in every language, in C and C++ and Objective-C.

There's a bunch of other types as well which are really useful for writing your own generic vector code, but I'm going to focus on sort of the common subset of what's available on all the languages on all the platforms in our talk today.

We have operations on those types, too, obviously.

There's all the usual arithmetic operations for vectors and for matrices.

And we have all the familiar geometry and shader functions if you've done any shader programming before.

Most of what you would want to use is, is going to be available here.

I'll show you a little example now.

So this is the same function written three times in three different languages.

We have it in Objective-C up top, we have it in C ++ in the middle, and we have it in Swift on the bottom, and you can see that the boilerplate is a little bit different between the languages just because function declarations work differently in these languages, but if we focus in on the actual computation part, it's essentially the same in all the languages, and it also really closely mirrors the, the way you would just write this naturally in math.

So you don't have a lot of weird function calls.

You don't have to write four loops, all that stuff.

You just write your code in this, this natural fluent style, and we translate all that for you.

So it's, it's nice and easy to write, and this looks just like the Metal code you would write to do this as well.

Now it happens that the reflect function is already available built into the library already.

So you don't need to write this yourself.

Calling functions between languages like there's a whole bunch of stuff in model I/O that exposes Objective-C API's that take these types.

Situation is pretty good.

The vector types are compiler extensions in C, Objective-C, and in C++.

And in Swift, they're defined as structs, but the compiler knows how to map between them for you.

So you don't really need to do anything.

Here's a simple example.

If I have an Objective-C function, I call some function here that operates on vector types, and I want to call that function from Swift using the Swift vector types, I can just do that, and it just works.

I don't need to do anything fancy.

These types have the same layout.

So there's no cost to converting or anything like that.

Similarly, for matrices, the Swift matrix types are layout compatible with the C, Objective-C, and C++ types.

So if I need to work with them, here I have a Swift, I'm going to create a Swift type from the C type.

All I need to do is use the init function.

Works fine.

Doesn't have any computational cost.

It just sort of changes the types silently for me, and, similarly, I can use the C matrix property if I need to get a C type to call C or Objective-C or C++ function.

So we have a few things that are new this year that I want to show to you.

We have three new functions - simd orient, simd incircle, simd insphere - and these are overloaded to support a bunch of different types and different lengths and things like that.

Basically everything in the simd library works that way.

So even though we have just three new functions, it's actually a lot of new stuff.

So I'm going to start with orient.

What orient does is it lets us answer the question is a set of vectors positively oriented?

And what that means, if you don't remember your linear algebra, is do they obey the right-hand rule.

You might remember that from physics, or, equivalently, is there determinate positive.

If there's any math majors in the audience, you're objecting right now that a set of vectors does not have a determinate.

What I mean here is just you, you kind of take them, and you slam them together to make a matrix.

Take the determinate of that matrix.

Is that positive or not?

So why do we care about this?

This is kind of simple and stupid.

You can use this to answer a lot of computational geometry questions that are very useful.

Like, is the triangle facing towards me or away from me?

If you think about a tetrahedron, there are two faces that are pointing towards you, and there are also two faces on the backside that face away from you.

And if you're doing graphics operations, it's useful to know which ones are facing towards you because those are the ones you want to operate on.

Similarly, if I have a line and a point, and I want to answer the question is the point on the line, or if it's not, which side of the line is it on.

I can use the orient predicate to answer that question.

Now this seems kind of simple, and it is simple except that it can be very hard to actually answer this question when the point is close to the line, and I'm going to talk about that more later.

So here's a simple example of that.

I'm going to have a plane over on the right-hand side of the, the display here, and I have, going to put some points on that plane.

So I have three points, a, b, and c, and I'm going to query the orientation of those with simd orient.

Now because we go counterclockwise as we go from a to b to c, there, we say that these are positively oriented.

That's what it means to be positively oriented in the plane.

If we move one of the points so that that order is clockwise, now it's negatively oriented, and if I move the point c so it's exactly on the line between a and b, then they're co-linear and the orientation is 0, or you might say it's degenerate.

Now it's very hard to tell in general if a point is exactly on the line especially with floating point coordinates.

And that's because orientation is numerically unstable.

So because of floating point rounding, if the results of this determinate is very nearly 0, you can easily get the wrong sign, and for some algorithms, that doesn't matter, but for other algorithms, you can actually fail to converge, or you may get the wrong result when you're working with this.

For collision detection and things like that, it can actually be quite important to be able to answer this question.

Also for things like triangular mesh generation for models and things, this is an important question to be able to answer exactly.

So let's look at an example where, where this is actually hard to answer.

I'm going to put two vectors in the plane, u and v, and these are almost identical vectors, right.

They, they differ by the smallest amount they could possibly differ.

And I've magnified them enormously on the right-hand side.

So you can see that they're actually different.

If I drew these to scale, they would, they would overlap entirely.

And if we compute the orientation in the way that you would normally compute this, we get a result of 0 because of floating point rounding.

Now this is a simple example where we get 0.

With dimensions bigger than 2 x 2, we can actually get the wrong sign entirely for this.

So it's not just that it could be 0 when it shouldn't be.

But if we use the simd orient function, we get a tiny positive number, which is, which is the right result.

These are positively oriented.

And I should point out here that don't interpret this tiny positive number as meaning anything.

It is not the determinate.

It is sometimes the determinate, but it's sometimes just going to have the right sign.

So what we're really interested here is in the sign of this number.

How are we able to do this?

So the geometric [inaudible] that I'm showing you today use something called adaptive precision.

We go and compute as many bits as we need to compute to get the right result, and this lets us return the right result very, very fast in most cases, but if we need to go off and do the exact computation to give you the right answer, we will, we will do that.

So you can just trust that that this gives you the right answer in your code, and you don't need to worry about that yourself.

Incircle is very similar.

We take three points in the plane.

That determines the circle.

You can notice that they're positively oriented around the circle here.

That's important.

And if I put a point in the middle, x, then simd incircle can tell me if it's inside.

In that case, I get a positive result.

If it's on the circle, I get 0.

And if it's outside the circle, I get a negative result.

And insphere is exactly the same thing.

Just in three dimensions now.

I need four dimensions to determine the sphere.

I give it my point x, and I get a result.

I'll show you an example I talked about earlier figuring out if a triangle faces towards you or away from you.

Here I've got a really simple struct to represent a triangle in Swift.

Triangle is determined by three vertices that I happen, a collection here.

And I have this predicate is facing to tell me if the, the triangle is facing towards the camera or not.

So the usual way you would compute this is to compute a normal to the triangle with the cross product, and then take the dot product of that with the vector to the camera, and if that's positive, then the triangle faces the camera.

We can simplify this code a lot and make it correct by just using the simd orient predicate.

So my code is simpler, it's fast, and it gives me the right answer.

Is it, these are all things I like, and that's what we try to do in Accelerate across the board is give you simple things for complex mathematical calculations.

So we showed you a bunch of new stuff today.

We have some new libraries entirely.

We have BNNS for neural networks.

We have Quadrature, and we also have some new features, orientation and incircle in simd.

Every one of these features and libraries is added in response to a request that we got from developers.

So we really want to hear from you guys about what you need, what we can add to make your computational workloads easier.

We want to give you simple interfaces that let you do what you need to do efficiently.

We've also done a ton of other stuff this year.

In vImage, which Eric mentioned briefly in passing, we have a whole set of geometry operations for interleaved chroma planes.

This is absolutely one of the most requested features we've had in recent years.

So we're, we're really excited to have that.

If you don't know what that is, don't worry about it, but if you know what it is, you understand why it's useful.

And we also have expanded supports for new formats in the vImage conversion routines.

This underlies a lot of the deep color stuff that you may have heard about.

So this is, this is really important for that.

We have improved the performance for interleaved complex formats in vDSP.

With the FFT's, we support both interleaved and planer layouts where the complex and imaginary parts are either separated or put together.

Planer layouts are what we really prefer to operate with, and we recommend you do that, but if you happen to have interleaved data, you can just use the FFT's now, and they'll be really fast.

We've also improved the performance of all the Level II BLAS operations.

Some of that's motivated by the, the BNNS stuff you saw, and some of that's just opportunity that we saw to go after that, and there's a ton of other stuff that's new in Accelerate, lots of improved stuff.

Every time new processors come out, we make sure that we're tuning for that.

We want to take care of all those low-level computational details so you can focus on writing high-level algorithms that, that just sit on top of that, and you can do the job you need to do.

In summary, we want to be your single-stop shop for computational algorithms where we give you implementations that are correct, they're fast, and they're energy efficient, and we're going to keep tuning them for new hardware as it comes along so you don't need to worry about that.

If you want to [inaudible] yourself and we ship a, a new processor, then you're going to need to update for that, and if you let us handle that for you, then you don't need to worry.

Keep the future requests coming.

We love to get them from you.

We talked to a bunch of you in the labs earlier today.

We got a bunch of great future requests already.

Can file radars.

We love for that to happen.

If you want more information, the link for this talk is here.

I also encourage you to check out previous years' talks where you've gone into more detail about other aspects of the library that are useful.

There were two great Metal sessions that I really highly recommend everyone check out from yesterday, especially if you're interested in this kind of stuff.

Thanks very much for coming out everyone.

[ Applause ]

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US