h.264 video with an alpha channel

June 5, 2013


Today's post is about h.264 video and how an iOS developer can incorporate h.264 video with an alpha channel using AVAnimator. In short, h.264 is a lossy way to encode video, see this wikipedia page for detailed info. With h.264, some very impressive file size reduction is possible. But, not everything about h.264 is perfect. One major problem with h.264 is the lack of alpha channel support.

Before going into more detail, it is important to clear up misinformation that one frequently comes across online. First, some people just do not seem to understand what an alpha channel is, as seen in some of the responses to this stackoverflow question. A previous post about RGBA pixels shows visual examples of what an alpha channel is and how it is implemented. Second, there is information floating around about how h.264 could implement an alpha channel (called frex extensions), but this is not actually useful because there is no current encoder/decoder that supports an alpha channel. Third, there is no other video format available by default under iOS that supports an alpha channel. What is available under iOS is a hardware based h.264 encoder/decoder that supports opaque video without an alpha channel.

What is presented here is an approach that makes use of the existing hardware decoder on iOS devices while also reducing file sizes as much as possible without unacceptable loss of quality. The developer will need to determine how much loss of quality is reasonable given the space savings associated with a specific compression setting.

First, the final result will be shown and then the elements that make up the solution are explained one at a time. The KittyBoom example app works as either an iPhone or iPad app. The example app shows a simple background beach image with an animated image (originally an animated GIF) of Hello Kitty skipping down the beach. After a few steps, the adorable little kitty steps on a land mine and is blown to bits.


The Hello Kitty animation loop comes from the following animated GIF:


The really interesting part of this example is the video of the explosion.


One can easily find this sort of stock explosion footage shot against a green screen online. Where things get interesting is when considering the file sizes. The Kitty GIF image is 19K. The explosion movie encoded as lossless 30 FPS video is about 28 megs uncompressed or about 8.5 megs once compressed with 7zip. Including an 8.5 meg video for this explosion is just not a viable option.

Asking users to download an iOS app that contains videos that are 8.5 megs each is just asking for lost app store sales. To save space and end user download time, the explosion video can be converted to a pair of h.264 videos and then both videos can be compressed down to a reasonable size. A command line script is provided with AVAnimator to implement the channel split logic. One video will contain the RGB components while a second video will contain a black and white representation of the alpha channel, as shown here:


With AVAnimator, this conversion process is implemented as a command line script named ext_ffmpeg_splitalpha_encode_crf.sh. Assuming a video had previously been exported to a series of PNG images, one would first encode the images to an MVID file at 30 frames per second like so:

$ mvidmoviemaker Explosion0001.png Explosion.mvid -fps 30
writing 152 frames to Explosion.mvid
MVID:               Explosion.mvid
Version:            2
Width:              640
Height:             480
BitsPerPixel:       32
ColorSpace:         sRGB
Duration:           5.0667s
FrameDuration:      0.0333s
FPS:                30.0000
Frames:             152
AllKeyFrames:       FALSE

Now the MVID file can be split into RGB and ALPHA components and encoded using the ext_ffmpeg_splitalpha_encode_crf.sh script. This script invokes ffmpeg and x264 to implement encoding of the h.264 video (both executables are provided with the AVAnimator utils download). By default, the CRF encoding would be set to 23 but a specific value can be passed as the third argument to the script. See x264EncodingGuide for more info, but generally experience shows that values in the range 20 to 35 are useful. The higher the CRF value, the more the video data is compressed in a lossy fashion.

$ ext_ffmpeg_splitalpha_encode_crf.sh Explosion.mvid 30
Split Explosion.mvid RGB+A as Explosion_rgb.mvid and Explosion_alpha.mvid
Wrote Explosion_rgb.mvid
Wrote Explosion_alpha.mvid
wrote Explosion_rgb.mov

After all the encoding steps have been executed, there is a new directory named MVID_ENCODE_CRF_30. This directory contains all the generated files. The ones of interest are the .m4v files, in this case Explosion_rgb_CRF_30_24BPP.m4v and Explosion_alpha_CRF_30_24BPP.m4v.

$ ls -la *.m4v
-rw-r--r--  1  79047 16:18 Explosion_alpha_CRF_30_24BPP.m4v
-rw-r--r--  1  72778 16:18 Explosion_rgb_CRF_30_24BPP.m4v

The output show some very impressive compression results. Instead of a 30 meg or 8 meg video, this process results in a pair of videos that are under 1 meg, combined the two videos take up about 1.5 megs of disk space. This is a significant space savings. Download the Kitty Boom example Xcode project to see the source code needed to load these two videos on an iOS device.

That is really all there is to it. The tricky code needed to read h.264 videos at runtime and combine them back together on a iPhone or iPad device is included in the AVAnimator library. The most complex aspects of conversion and encoding with the x264 encoder are all handled on the desktop.