High Efficiency Image File Format

Session 513 WWDC 2017

Learn the essential details of the new High Efficiency Image File Format (HEIF) and discover which capabilities are used by Apple platforms. Gain deep insights into the container structure, the types of media and metadata it can handle, and the many other advantages that this new standard affords.

Hi and welcome to session 513.

In this talk you will learn about the lower level details of the new High Efficiency Image File format or HEIF and the many advantages that this new file format standard affords.

My name is Davide Concion and I manage the Image Compression Team at Apple.

During the talk, we will briefly touch upon the current de facto standard for image compression, a standard that everybody's familiar with JPEG.

We will go through the requirements that Apple identified as mandatory for a new image format.

We will explain why we think HEIF is the answer to those requirements and we will get to know some of the flexible tools that HEIF implements.

We will then present the reasons why Apple thinks HEVC is the right codec to be used within the HEIF file format.

Let's start with JPEG.

JPEG is still the most popular compression technology for images only present on the web and on consumer electronics devices, such as the DSLR cameras, point-and-shoot cameras and cell phones.

Cloud services also use JPEG because of its universal support.

JPEG though has several limitations, among those are the compression efficiency.

Several new compression algorithms have been developed in recent years that can shrink the file size much more than JPEG and still maintaining the same objective and subjective quality.

Auxiliary images like alpha or depth are not easily supported.

Also, in recent years new ways to present and display animated images have been developed.

Apple Live Photo is one of them.

JPEG unfortunately does not support animation.

Let's look at history map of compression standards developed by JPEG and ITU/MPEG.

JPEG is really starting to show in its years, especially in terms of compression efficiency when compared to recent advancements.

As you can see in the slide, JPEG has been finalized as a standard in 1992, a quarter of a century ago.

Since then, several new compression standards have been developed.

The latest one in the least is HEVC.

And here is HEIF for comparison in the timeline, which has been finalized in 2015.

Apple invested a lot of time to find a successor for JPEG and many options were evaluated.

The requirements were extensive.

The new format needs to support all the features available in JPEG, but at the same time provide better performance.

It needs to be friendly to professional photography tools, the web and the cloud.

The new format also needs to be flexible and extensible to cope with the ever-changing photography ecosystem.

Here is a list of features Apple considered paramount.

The compression needs to be state-of-the-art both on the [inaudible] front.

It needs to be competitive with natural images, but also when compressing text or graphics.

The format needs to be friendly to hardware accelerated and code and decode operations on modern CPUs, GPUs and ESPs.

Performance and power is very high in the list of requirements.

It needs to support high depth and wide color gamut which is the new frontier for images captured on consumer devices.

It needs to be able to compress 4:4:4 color sampling and also describe HDR content, including HDR metadata, transfer function and color space definitions.

Auxiliary images for example alpha or depth need to have a commonly defined place in image files.

New editing tools will be able to utilize auxiliary data for new presentation and editing experiences.

In recent years, the new ways to present and display animated images have been developed.

Apple iPhoto is one example.

Apple iPhoto includes animated content together with static images.

The new common format needs to store animated information efficiently, ideally using Tempra compression techniques and be able to instruct players about the presentation intention for example, a looping sequence.

The new format needs also to support multiple images in the same file.

For example, multi-exposure stacks or stereo images.

This is to aid the development and implementation of new computational photography algorithms.

Multiple representations of the same image matching the same file are of great importance.

For example, multi-resolution, including progressively increasing level of details or the ability to represent the same image encoded with different codecs.

Tiles are an important tool the new format must implement.

It allows for scalable operation on image of any size.

We'll be looking into tiles later in the talk.

The new format needs support for rich metadata associated to each image in the file.

And also support for time meta for example, a sequence of images.

There is also desire for a new format to be able to include other metadata types for example, audio or text.

Last but not least, the new format should be flexible and extensible enough to provide a solid foundation for the future.

We believe that HEIF is the answer to all these requirements.

What is HEIF?

HEIF stands for High Efficiency Image File Format.

Version one of the spec became an ISO standard in June of 2015.

Version two should be released imminently.

A C reference model for HEIF is available upon request at this link.

The reference model is meant to provide guidance for HEIF implementation and to understand the specifications.

As a side note, the open source project GPAC MP4Box has recently added basic functionality to part C files.

The vodaworld long time ago learned that containers and codes are different entities and there are several advantages in keeping them separated.

But historically in the image world containers and codecs are tied together and JPEG is no exception.

It makes sense to make the distinction in the image world to get the flexibility to [inaudible].

HEIF does exactly that, it specifies a structural format, a container for individual images, as well as image sequences.

It is built on top of the widely used ISO based media file format which is based on Apple QuickTime technologies.

It also uses and enhances structures defined in the MP4 specification and MPEG-21 specifications.

Sequences for example, bursts or animations are stored as tracks or time media MP4 style.

Images coded or derived are stored as items MPEG-21 style.

Any compression codec can be included in a HEIF container.

The HEIF specification explicitly mentions HEVC, H264 and JPEG in terms of file extensions, [inaudible] types and decoder configuration.

The basic building block of a HEIF file like the ISO based media file format is a data structure called box.

A box is comprised of a four-character type, for instance in the example on the right the ftyp box or the metabox or the mdat box.

The size of the box in terms of bytes and the payload of the box.

The metabox gives a full description of what is included in the file.

The handler type of the metabox for whomever is familiar with ISOBMFF specification is of type PICT indicating to a reader that this metabox handles images.

Before going into the anatomy of a HEIF file a note on file extensions.

The standard defines explicitly the file extension of a HEIF file depending on the particular codec being used to compress single images or sequences.

The list of extensions can be found in the table above.

iOS 11 can capture and store HEIF images using the HEVC codec.

Therefore, the extension you will be encountering is .HEIC.

In iOS 11 and macOS 10.13 we support all three single image HEIF flavors for decoding and displaying.

Note also that a HEIF file that includes sequences will have a different extension than a HEIF that contains only single images.

We will now dive into the HEIF format and its anatomy.

Let's start with the concept of item.

Every element in a HEIF file is an item.

There can be coded items for instance, HEVC encoded frame or tiles.

There can be derived items for instance, an image overlay or an image grid.

There can be metadata items for instance, EXIF, XMP or MPEG-7 metadata.

Each item can also come with several properties associated to it.

Everything is then connected via structures that link certain items to other items or properties.

Images are items and because multiple images can be stored in the same file the HEIF standard differentiates between them by assigning certain roles.

Some of the roles specified in HEIF are listed in the table above.

The primary recovery image is the representative image of a file.

The primary image should be displayed when no other information is possible or decodable by a player.

Only one primary image can be present in a HEIF file.

Other full-resolution images in HEIF files are called master images.

The thumbnail is a small resolution representation of a master image.

Multiple thumbnails can be stored in a HEIF for example, with different sizes.

It's a very useful feature for progressive decoding and displaying very high-resolution images.

The auxiliary image is an image that complements a master image.

For example, an alpha plane or a depth map.

Auxiliary images can assist in displaying master images, but are not typically displayed.

A hidden image is an image that should never be displayed.

It can be present in the file for example, as an input image of a derived image.

iOS 11 HEIF implementation uses extensively hidden images which are called tiles.

Each tile is used to compose the final master or canvas image.

Now derived manager is an image that is rendered by an indicated operation being performed on other input images.

For instance, the canvas image described before is rendered after stitching together multiple tiles.

Equivalent images are alternative images for instance, encoded with a different codec.

A server could distribute the same input content to players that may have different decoding capabilities.

Once the role has been defined for each image properties can be associated to them.

Properties are either descriptive or transformative.

They can also be essential for example, the codec initialization info or nonessential.

The table above provides a non-exhaustive list of descriptive properties for images inside a HEIF file.

All the usual suspect information can be found in there like the image size, the color information, the type of auxiliary image which can be alpha or depth and also the configuration parameters to initialize the decoder.

The table above provides a non-exhaustive list of transformative properties.

The presence of these properties instructs a HEIF [inaudible] that the image needs to go through extra steps before being displayed.

For example, the clean aperture property instructs a HEIF reader that the crop operation must be performed before rending the final image.

All the properties for each image are grouped together in the same item property box.

Each image can then be associated to which property via the association box.

We will use an example to describe how the association works.

The above HEIF container on the left describes the file with one main image and one thumbnail.

The main image is comprised of four tiles.

The item property box or ipco box contains all the decoder configuration and the sizes of the main image, the tiles and the thumbnails.

Note that the order matters for this box.

The association box or ipma box on the right groups properties nicely based on their position with the item ID in the file.

As explained before, there is a total of six items in the file, one image, four tiles and one thumbnail.

Items 1 through 4 are the tiles, these are hidden images with properties in position one, the decoder configuration and position two, the size which is 500 by 500 pixel.

Item five is the main image, only the size property is defined since this is this is a derived image.

The size is 1,000 by 1,000 pixels.

Next, we will briefly talk about image sequences in HEIF.

When sequences are embedded in a HEIF file the move box and it's sub boxes are also present in the file.

The move boxes fully described in the ISO MP4 file format specification from which HEIF derives.

Each sequence of images or samples is described via the trak box where all the timing information to play back the track is included.

HEIF specifies a new track handler for picture called pict.

The key difference is that while the timing information given for a video or an audio track is used to synchronize the playback the timing information an image sequence track can represent either the capture time for example, a burst or the suggested display time for example, to derive a slideshow.

Roles can be used for image sequences to.

For example, a HEIF file could embed a track of thumbnails or a track of auxiliary images associated with the master track.

One of the most important HEIF features is the ability to control the playback by signaling in the file the intent of the creator.

For example, an edit list enables modifying the playback order and pace of each sample.

HEIF also allows indicating edit list repetitions for example, for looping animations.

The repetition can be indicated to last for a certain duration or be infinite.

Given that the ISO tracks can be used in HEIF files interframe prediction is also available.

Inter prediction is the ability to remove coded information by predicting the content of the current frame from similar frames in the past or in the future.

This gives a tremendous advantage in terms of compression.

Inter prediction can also introduce a delay decode time because the previous frames must be decoded first before being able to decode the current frame.

HEIF allows inter prediction, but also includes constraints in the file to limit frame interdependencies.

For instance, each predicted image can be restricted to point only to unknown predicted image or inter.

In this case, the time to decode each frame in a sequence becomes deterministic.

Last but not least, a HEIF image can be subdivided into tiles.

Tiles are rectangular regions within an image.

They are completely independent items in a HEIF file and they can be of different or same size.

If their size is different a relative location property describes their position in the final image.

If their size is the same the final image is described as a grid. Several reasons why tiles make HEIF extremely flexible.

A player can exploit parallelism and decode time.

For example, each tile can be separately and independently decoded.

Tiles can be used to reduce memory consumption when resizing an image rather than decoding the whole much and then apply a rescale operation each tile can be independently decoded and rescaled and then placed in a smaller buffer for rendering.

Cropping becomes very fast because a player does not need to decode the whole image to extract a certain region.

This property is extremely useful for zooming operation.

For instance, a gigapixel image could be decoded and displayed and zoomed in with ease without the need to decode the whole image into a multi gigabit buffer.

Of note, the tiles can be used also as an encoding tool.

A smart encoder can make different decision based on the content of each tile.

Apple HEIF implementation uses tiles extensively.

Note though that the HEVC specification also supports subdividing a frame into tiles as a parallelization tool.

Apple does not use tiles in HEVC parlance, but rather each tile is a whole HEVC frame, we call them system tiles.

Next, we will talk about HEVC, the codec Apple has chosen to compress HEIF photos.

Two of the major reasons for selecting HEVC.

First, HEVC is the latest technology in the compression standard world.

With HEVC we see an average of 2X compression compared to JPEG containing the same visual quality.

Second, HEVC hardware support is becoming available in most CPUs and GPUs.

For instance, HEVC hardware support is available from the sixth generation Intel core processors.

This means except means exceptional performance without sacrificing battery life.

Several inter coding tools have been added to the standard that allow HEVC to outperform JPEG.

In the next few slides we will mention some.

You will notice that the common theme here is flexibility.

First, the block size.

JPEG divides each image into a grid of blocks of 8 by 8 pixels.

These blocks are then described transformed and quantized.

HEVC has the flexibility of being able to divide an image in blocks that are 64 by 64 pixels down to 4 by 4 pixels.

The transform size is also flexible within the block.

A new optional discrete [inaudible] transform has been added to the standard and three possible scanning orders are available to group coded coefficients.

Next, the block prediction.

JPEG allows the top left corner coefficient also called the DC component or the constant component of an 8 by 8 block to be predicted from the block on the left.

HEVC adds the flexibility to predict every pixel value within a block.

Up to 35 angular predictions are available.

Being able to remove redundant information in a block by exploiting similar information available in neighboring blocks is one of the most efficient tools inside HEVC.

Entropy coding.

JPEG uses Huffman coding as the engine for statistical encoding.

The idea is to assign variable length codes to input coefficient.

With shorter length codes assigned to coefficients with higher frequency.

HEVC on the other hand, employs an arithmetic coder called CABAC which stands for Context Adaptive Binary Arithmetic Coding.

CABAC is notable for providing much better compression than most other entropy encoding algorithms.

Quantization.

Quantization is a [inaudible] C compression technique achieved by compressing a range of values to a single quantum value.

JPEG utilizes global quantization matrixes for each 8 by 8 block.

HEVC on top of the quantization matrix adds the flexibility of assigning a different quantization parameter for each block.

This allows smart encoding algorithms to compress more areas of an image while the human visual system is less susceptible to detect artifacts.

For instance, high-frequency content.

Next is the blocking, a tool that is available only in HEVC.

Blocking artifacts are visible discontinuities occurring at block boundaries.

The HEVC deblocking filter is a filter applied to the pixels around the block edges to smoothen the transition and get more pleasing visual results.

SAO which stands for Sample Adaptive Offset is an extra filtering step available in HEVC that is applied to the output of the deblocking filter to further improve the quality.

It's a local filter that can attenuate bringing artifacts or changes in sample intensity of some areas of a picture for a better visual quality.

Both these techniques allow for more pleasing images, especially when the compression is very high.

We have gone through several HEIF and HEVC features and tools, I wanted to take a second to mention a few characteristics of HEIF files captured on iOS 11.

First, the extension for HEIF images captured with iOS 11 will be .HEIC because of the HEVC codec.

The HEVC profile utilized to compress images is the main still profile.

Also, we use HEVC monochrome profile for depth data.

Images are encoded using tiles that are 512 by 512 pixels.

They are positioned in a grid fashion to cover the whole image.

The thumbnail is a 320 by 240 image HEVC encoded.

It is four times the size of a common 160 by 120 JPEG thumbnail and this is to help showing better thumbnail quality when images are displayed on modern screens with high pixel density.

EXIF metadata is part of the HEIF file like JPEG for backward compatibility.

Depth data is stored as an auxiliary image and the metadata pertinent to depth is stored as XMP payload associated with the depth image.

Last, a note about file creation.

The HEIF standard does not mandate any order for the boxes a reader could find at the top level of a HEIF file, but we found that ordering them in a certain way greatly helps parsers and decoders.

For example, having the thumbnail early in the file would allow parsing and display huge amount of HEIF images without the need to parse the whole file.

For [inaudible] transmission or web application once the metabox is received all the information for the file is available and therefore readers can start configuring the decoding and display pipelines before having received the whole coded data.

Let's summarize what we have learned today.

The photography world needs a better image file format to replace the rather old JPEG.

We looked at the extensive list of requirements that Apple considered paramount when searching for a JPEG replacement.

We believe HEIF is the answer for all the requirements.

Its flexibility allows to handle with ease and elegance the advancements available in iOS 11 and its extensibility also allows HEIF to be a solid foundation for the future.

We then analyzed the various features available in the HEIF standard.

And finally, we looked at the HEVC tools that make it the best choice, both in terms of compression efficiency and friendliness toward hardware architecture for performance and power.

For more information, please visit the URL for the High Efficiency Image File format session 513.

And if you're still at the show we invite you to visit the two related sessions about HEIF and HEVC.

Thank you for watching the talk and enjoy the rest of WWDC 2017.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US