Alpha Channel vs Green Screen

Nov 30, 2015

Encoding video for mobile devices is a complex topic. There is no perfect solution since the technology that works best in a specific situation depends on the kind of video data that is being encoded. In addition, the type of video technology used can depend on the specific implementation details of the mobile app and the hardware limits imposed by the device.

Before getting into specific details, it is important to clearly define terms. In regular video, each RGB pixel is defined by 3 components. The addition of a 4th channel, called the alpha channel, is what sets RGBA video apart from plain RGB video. In a previous alpha channel post, example images show an alpha channel and how the final rendered result would look when rendered over different backgrounds. It is not possible to directly encode RGBA pixels as a h.264 stream because h.264 does not support an alpha channel. It is also important to understand that an alpha channel defines transparency on a per-pixel basis. A per-pixel transparency is not the same as a transparency value defined for an entire layer or image.

In comparison to the alpha channel approach, a video encoded with a green screen background does not contain 4 channels. A green screen encoding means the video elements that are completely transparent are represented by pure green pixels and any partially transparent pixels are encoded as partial green.

The specific video content that will be demonstrated here is everybody's favorite plumber from the 64 bit hit of days past.

    Mario Image

The original video data was just grabbed out of this Mario youtube clip of a ROM that someone hacked so that all the background elements would be rendered as a solid green color. The video was processed with Blender to produce a high quality Matte that converted the green pixels to transparent. The video was cleaned up a bit and the framerate was reduced to 2 FPS, so that the motion blur in the jump loop can be seen easily.

The reason for choosing this specific video content is easy to see. The Mario clip generally contains mostly blue, red, black, and white along with some other colors. What this video does not contain are any green or near green colors in the foreground subject. The goal is to avoid the well know problem where a weatherperson wearing a near green color has the background pixels merged into the foreground subject.

    Mario Image

Anyone with experience in video production has at some point had to deal with poor quality green screen issues. To make a long story short, the best way to deal with green screen issues is to avoid them in the first place. It is absolutely critical to start out with good quality input video. Some interesting videos that describe and show examples of green screen issues can be found on youtube.

The Mario jump clip was selected for this example because it should not have problems that can be directly blamed on poor quality source video. The bright and basic colors in Mario's costume should stand out very well against a green screen. If and when problems do appear, then one can be sure that the problems are a result of H.264 encoding or Matte generation on the client mobile deice.

Encoding Approaches

The alpha channel encoding approach stores RGBA channels for each pixel in the video. In AVAnimator, this is implemented by encoding 2 different h.264 videos. The first video stores the RGB pixels while the second video stores the A channel as grayscale pixels.

    Mario RGB Image

    Mario Alpha Image

The second approach makes use of a solid background set to the (R,G,B) value (0,255,0). The decoder must know the exact background color that represents the fully transparent pixel. This second approach generates a single video that looks like the following:

    Mario Image

The alpha channel and green screen encoding approaches produce results that are roughly alike, but it is important to understand the basic technical reasons why the green screen approach will always generate lower quality results as compared to the alpha channel approach.

When video data is encoded as h.264, the RGB pixels are converted to a representation known as YUV. The color conversion matrix used to convert RGB to YUV in h.264 are known as BT.601 or BT.709. In general, the conversion process from RGB to YUV pixel data allocates more of the colorspace to the G component of the RGB pixel data than to the R or B components. A larger dynamic range for the G component is in fact one of the main reasons that green is typically used as the background in the green screen process.

Once colorspace conversion is completed, the YUV components are further split into Y and UV parts via a process known as Chroma Subsampling. The result of the 4:2:0 subsampling used in h.264 encoding is that the U and V channels are stored with half the resolution of the Y component. This Mario clip has a native resolution of 960x640, so the encoding process would store the Y component data with a resolution of 960x640. The U and V component data would be stored with a resolution of 480x320.

The alpha channel encoding approach uses a second video to store the Y component at the native resolution. For the Mario clip, this means that the alpha channel is stored with a resolution of 960x640. The end result is that the decoded output of the alpha channel approach is of higher quality as compared to a green screen encoding, simply because a full resolution alpha channel is stored.

Quality Comparison

The quality of the decoded result from the alpha channel approach vs the green screen approach will now be compared. In practice, the only way to determine if the quality of a specific lossy encoding process is acceptable is to actually encode real test video and then examine the results on the actual device. If an encoding process is too lossy, then customers will have a more negative opinion of a product. A lifetime of viewing extremely high production quality movies and television have produced populations that are unforgiving when it comes to poor quality visuals.

The following results were generated by running the Mario clip through the encoding and decoding process on iPhone hardware for AVAnimator and GPUImage. The GPUImage logic is based on the Core Image filter named CIChromaKeyFilter. Another implementation of a similar approach can be found on github here. The original movie is encoded with dimensions 960x640 and this size is too large to be practically useful for display on a webpage. In the following examples, the movies corresponding to the alpha channel approach and the green screen approach were decoded frame by frame, and then those frames were cropped to a specific region of interest.

Alpha F01 Green F01
Alpha Crop1 Green Crop1

The initial frame image shows Mario's head. The difference in quality is easy to see when comparing the result of the alpha channel encoding on the left to the result of the green screen encoding on the right. The partial transparency pixels on the edge of Mario's hat and shoulders are significantly effected in the green screen encoding, as the edges appear jaggy. While the visual quality is a bit degraded the quality loss for this frame is not critical.

The 4th frame is where the first really serious problem appears. A motion blur effect is used in this frame to animate Mario bending down, just before he jumps into the air.

Alpha F04 Green F04
Alpha Crop4 Green Crop4

The opaque pixels in Mario's body render with good quality, but the motion blur for Mario's hat generates partially transparent pixels that end up getting severely compromised by the end of the decoding process. The alpha channel version of this same frame on the left does not have the kind of quality degradation found in the green screen version.

There are two distinct reasons for the inconsistent results in the green screen version. The following images show the 4th frame in the state before and after the GPUImageChromaKeyFilter has been applied.

h.264 pre F04 h.264 post F04
h.264 pre Crop4 h.264 post Crop4

The comparison above shows that the h.264 encoding mixes the red and green color in the background and the motion blur for Mario's hat. The mixed colors are not a big problem visually, but the varied colors do become a problem once the GPUImageChromaKeyFilter filter is applied.

The pure green pixels in the background are correctly converted to completely transparent pixels and the opaque pixels with no partial transparency remain opaque in the final result. The problem shows up with the partially transparent pixels as they are not treated consistently by the GPUImageChromaKeyFilter filter. The way the green screen filter operates is a little tricky, but it can be understood at a basic level via a simplification.

This following color ramp shows how a darker green inside the motion blur area would blend into a pure green color via a simple linear color ramp. In this case the darker green color is (R,G,B) (65,87,9) or 0x415709 in hex notation.

    Green Ramp

When this simplified color ramp is passed through the green screen filter in GPUImageChromaKeyFilter and it is then composited over a checkerboard background the result is:

    Trans Ramp

If just the alpha channel result of this filter is extracted and then normalized to the full (0,255) range then the result would look something like the following.

    Norm Ramp

The color ramp images above show how the opacity filtering logic in GPUImageChromaKeyFilter will convert pure green and near green values to the fully transparent pixel value. Looking at the ramp image from right to left, one can see how pixels are treated as transparent up until a point where the green pixels get so dark that they are no longer considered transparent.

The region between the fully transparent pixels and the fully opaque pixels is the root cause of the problem with GPUImageChromaKeyFilter. The problem is that darker greens are seen as partially transparent. These partially transparent pixels will contribute a green color to the final result even though the original pixel may not have had any green contribution.

The ramp example is oversimplified since the actual GPUImageChromaKeyFilter is a non-linear filter that operates on all 3 (R,G,B) color components. Mario's hat demonstrates that the filter can run into some serious quality problems with portions of images that fall into the range where the filter emits partially transparent or almost opaque pixels. The following images show how the poor quality motion blur looks when rendered over a few solid background colors.

F04 over white F04 over black F04 over blue
F04 over white F04 over black F04 over blue

The Other Side of Quality

The example above shows a specific problem where the green screen filter leaves inconsistent green pixels behind in the final result. But there can also be a real quality problem on the opposite side, when pixels are filtered "too transparent".

Alpha F13 Green F13
Alpha Crop13 Green Crop13

In the 13th frame, one can see an imperfection on the right side of Mario's face. This effect is a result of the h.264 encoding process, since it effects both the alpha channel and green screen encoded images. The trouble is that the green screen encoding and decoding process makes the degradation significantly worse. Changing the background color to blue can make the effects easier to see.

Alpha F13 Green F13
Alpha Crop13 Green Crop13

In the alpha channel encoded version, the right side of Mario's face is a little more blue, but the result is not too bad. In the green screen version, the right side of Mario's face becomes a very hard edge. In addition, the hand shape is almost completely lost by the green screen encoding process. Too many of the pixels in the hand are considered fully transparent by the filter. The hand also suffers from some green tint left behind by the filter.

Other Filter Examples

Seeing the results generated from some other more complex examples can also be instructive. This example shows a girl standing in front of a green screen. The image once passed through the filter shows quite a bit of green spill on the jacket and skin areas in the image.

Green Filtered
Green Filtered

Displaying the output of the filter over a red background shows how significantly the edges of the jacket, the hair, and the skin area can be effected by the filter logic.

Over Red
Red

The final filter output example is a color wheel with a variable alpha channel. The alpha channel value decreases in a starburst pattern from the center outward. This is the worst possible input for a green screen filter, but it is useful because it shows the entire range of the filter response in one image. These images were captured from the output of GPUImage and AVAnimator and then a composition over a black background produced the following images.

Original
Original
GPUImage
GPUImage
AVAnimator
AVAnimator

Complete Example Source Code

The following example iOS app provides a working example that makes use of both AVAnimator and GPUImage to implement the same Mario jumping animation. If you have not heard of GPUImage before, it is an impressive piece of software written by Brad Larson that is focused on hardware filtering of image data using CoreVideo and OpenGL APIs. While GPUImage does produce results with minimized runtime resource usage (since filters are implemented directly in OpenGL), the specific issue being focused on here is the quality of the results generated on iOS hardware.

The Mario clip contains 16 images at 960x640 resolution. This animation can be displayed at a 1:1 pixel ratio on an iPhone4 series hardware in landscape mode. On other iOS devices the animation would be scaled to fit the width of the screen while the same aspect ratio would be retained.

An interested developer can find the original lossless input data along with the decoded lossless output data encoded as Quicktime movies in the MarioJump/MEDIA directory in the following repo. The MarioOriginal_960_640.mov file contains the native resolution 32BPP lossless video. Be sure to clone the repo and open the files locally or download the zip file since web based viewers cannot be counted on to display quicktime data properly. Also note that the encoded h.264 video (encoded with ffmpeg+x264 and CRF 20) can be found in the directory MarioJump/MarioJump.

The AVAnimator implementation of this example is rather simple, but the GPUImage implementation turned out to be significantly more difficult than expected. It ended up being quite difficult to actually capture the processed output using GPUImageChromaKeyFilter from GPUImage. The initial implementation attempted to make use of GPUImageChromaKeyFilter, but that filter by itself does not work properly at runtime in the current implementation of GPUImage. With Dr. Larson's help, a workaround for this iOS view composition issue was created using GPUImageChromaKeyBlendFilter and a second empty blending image. See the GPUIMAGEWORKAROUND define in common.h, if this symbol is commented out then the previous buggy GPUImageChromaKeyFilter effects can be seen. It is unknown if this workaround reduces GPUImage performance significantly.

While GPUImage provides a way to render movie frames at the encoded framerate, there did not seem to be a way to define the video framerate at runtime. So, as a result the original movie was encoded at 2 FPS so that each GPUImage decoded frame could be seen clearly at a rate that would match the playback rate dynamically defined for the AVAnimator example. It would have been nice to add a "slow motion" toggle switch so that the same animation could be viewed both at slow and regular speed, but just displaying the slow motion video was acceptable for this trivial example.

Quality Conclusions

The bottom line with respect to quality is that the green screen encoding and decoding process will always result in a lower quality output compared to the alpha channel process. The exact amount of the difference in quality depends on the colors that appear in video content to be encoded. The Mario video was selected because the large animated character with simple colors should have provided an ideal contrast to a green background. Unfortunately, the green screen results were less than ideal and there was significant quality degradation in certain frames of the video.

One might object that specific filter settings could be tuned on a clip by clip basis to optimize filter performance. In this example, the background color was set to pure green and the thresholdSensitivity value was set to the default 0.4 value found in GPUImage examples. The problem is that one sensitivity setting is not going to be able to account for problems where the green screen filter leaves green pixels behind while at the same time marks too many pixels as fully transparent. The alpha channel encoding process does not suffer from this basic limitation because the color of a specific pixel is unrelated to the alpha channel value for that pixel. As a result, the alpha channel process encodes partially transparent pixels with less color shift, since the foreground pixel color is not mixed with green. The color wheel example shows how significantly the filter approach can degrade results in the worst case.

The girl image shows that the green screen process can significantly reduce fine detail. Note how the hair appears significantly clipped. In addition, certain areas that should be in the foreground, like the sleeve edge and neck, are incorrectly removed by the filter. The most effective way to deal with these kinds of video quality issues is to hire a skilled video editor. A video editor already knows how to use existing desktop video editing packages to create production quality video. A video editor is able to define garbage mattes and fine tune green screen filter parameters to account for all sorts of video quality issues. A video editor can then deliver a Quicktime file (Animation codec) to deliver a lossless 32BPP video with full alpha channel support. An alpha channel video can be imported directly by AVAnimator to produce the split color and alpha channel videos.

In actual practice, the green screen approach is just not worth the trouble. The visual imperfections and glitches tend to not show up until the video content has been integrated at least to the point of a working iOS app prototype. Generally, one has to actually run the app on real iOS hardware to see what the output of the OpenGL shaders actually looks like. If there are imperfections in the original video, then these problems can be hard to detect when one has to wait until the video is in the app before results can be reviewed.