Adaptive Lossless Prediction

Jan 20, 2017

A developer interested in graphics compression should take a look at the paper titled "Adaptive Lossless Prediction Based Image Compression" by R. Jovanovic and R. Lorentz. This research describes a matrix minimization approach where for each pixel in the image, the smallest prediction is selected in either the horizontal or vertical direction.

Thing is, there is a huge problem in the academic research community. Research ideas may be interesting, but research tends to generate only a paper for publication. To evaluate an idea, a developer needs access to the implementation. A developer must be able to evaluate the effectiveness of the approach along with the CPU and memory usage. The rough sketch of an idea provided by a research paper is simply not good enough.

Open Source software is the dominant method of software distribution and development in modern times. The research community needs to catch up. Currently, most research is published without access to source code. Even in the odd cases where source code is provided, the implementation is typically provided only as Matlab code. Matlab can realistically be run only by other researchers with a university wide Matlab license. This is unacceptable.

I decided to put my money where my mouth is and implement a C++ version of the approach described in the ALP paper and then publish it on github. The Matrix minimization seemed promising and I wanted to see how CPU efficient I could make the implementation with templates and modern C++-11 STL containers. The source code is available with a BSD license.

Grayscale vs RGB

Before getting into the implementation details, a quick diversion related to Grayscale images used in research papers is required. Often, research approaches are developed using only grayscale image data. Grayscale is one dimensional data, meaning it contains a single value in the range (0,255) for each pixel in an image. Color images are three dimensional data where each pixel is represented by a three dimensional (R G B) vector value. The original ALP research used a distance function that was based on the delta between grayscale values, but this approach was not useful for color images. The pixel distance function used in this implementation is a sum of the absolute value of the differences between each pixel component.

  dist = abs(dR) + abs(dG) + abs(dB)

WaitList Data Structure

In section 4 of the original paper, the WaitList structure is described briefly. The key issue with this data structure is to make use of a doubly linked list so that a new coordinate could be pushed and popped quickly with O(1) like access time. A quick and dirty first implementation attempt made use of a C++ multiset, but the multiset demonstrated absolutely horrible performance due to this access issue. The revised implementation made use of a C++ list (a doubly linked list) as described in the original paper. The performance of the list based implementation was much better than the multiset, but there was still a big performance problem. The vast majority of the execution time was spend in new/delete for the list nodes. There had to be a better way.

I briefly considered using a stack based C++ allocator, a good implementation by Howard Hinnant can be found here. Howard's stack based allocator actually worked well for vector and multiset, but the performance with the C++ list container did not improve.

The idea of using statically defined segments of memory was really the only way to improve performance, so I created a custom statically defined double linked list implementation that is able to avoid calling new/delete when adding or deleting list nodes. The memory for all the nodes is already statically defined at the start, so insert and remove node operations simply update next and prev refs. In addition, each node value is simply mapped to vector slot, so that all node values for a given priority are stored in a single vector. The entire implementation is abstracted into the templated header StaticPrioStack.hpp. Performance improved significantly with the addition of this super efficient statically defined priority stack. This statically defined priority stack represents a significant improvement to the implementation described in the original paper.

Prediction

The distance function was mentioned earlier, it is a sum of the absolute values of the (R G B) components. This distance function is used to find the minimum priority based on the pixel neighborhood. Minimizing the distance function has the effect of processing the largest edge pixels last. A simple prediction model based on using either the horizontal or vertical direction was implemented, but the results were very poor when tested with color images. While the 1D prediction with either a horizontal or vertical direction was useful for Grayscale pixels, this same thing did not work with 3D (R G B) pixels. The problem was that simply choosing either the horizontal direction or the vertical direction was not optimal for all 3 color components. It might have been possible to split the image into 3 color planes and then run the algorithm on each color plane, but that would be 3 times slower and overall performance was already an issue. A new approach was needed.

The solution to the prediction problem was implemented using the MED predictor from JPEG-LS along with a directional selection bit of logic that depends on which pixels in the local neighborhood had been processed already. Each component of each pixel in the neighborhood is passed through the MED predictor and then a prediction error is stored for each pixel. This approach provided much better results for the 3D pixel data while still running the minimization logic just once.

Performance

The vast majority of the execution time is spent calculating the sum of all the nearby neighbors for the purpose of gradient estimation. This area of the code has been significantly optimized and I am quite sure that additional performance improvement would be difficult. All the details can be found in the module ColortableIter.hpp. This module contains quite a few functions and classes, but it is commented well and should be reasonably straightforward to understand. The implementation is already significantly optimized to use only int math and use fast int approximations for operations like divide by 3. The comparison of execution times that follows shows the number of seconds needed to process a very large 4000x2000 image on a desktop system with the simple MED predictor vs ALP prediction.

  MED : 0.21 s
  ALP : 1.45 s

The ALP approach is about 3 times slower than a simple MED predictor. Overall, this is not bad. The ALP approach is doing significantly more work as compared to MED. As long as the compression results offer a significant win, then this additional CPU time is acceptable.

Compression

The real benefit of the ALP approach comes from the implicit reordering of pixels as they are predicted. A simple MED predictor approach starts with an implicit assumption that pixels are to be iterated row by row and column by column. The ALP approach instead chooses the next pixel to be processed based on the minimum gradient estimation of neighbor pixels. The ALP approach can deliver superior compression results because lots of repeated pixel delta values are grouped together.

A real world example will help to clarify this point. The following example is the famous Lena image used extensively in the graphics community. This image is a useful example because it contains image data with a lot of edges and large number of different colors.

    Lena Image

The iteration order is defined in terms of the min prediction values from pixel to pixel. To fully understand how iteration works, just watch it happen as an animation and see how alike regions are iterated. The video linked here shows the iteration progression by filling in Red pixels as the matrix minimization process runs one pixel at a time. Note that this H264 video will need to be downloaded and played on your desktop due to difficult browsers compatibility issues.

When the Lena image is presented in iteration order, one can see that very alike pixels are grouped together in row by row and column by column order. The output looks like the following:

    Lena Image

The results of comparing the prediction residuals between the MED and ALP approaches can provide some hard numbers. The following is the mean squared errors generated from prediction residuals with MED and ALP.

  MED MSE 203.9
  ALP MSE 213.6

The mean squared error is just slightly larger with the ALP process, but one must examine the resulting size of the compressed output to get a real idea of how prediction and pixel reordering can result in a smaller compressed result. The following results show the compressed size of the original Lena image as a PNG (compressed with zlib) as compared to the 3 byte prediction residuals compressed with Google Brotli (a modern zlib like compression library from Google).

  PNG (zlib)   474 kB
  MED (brotli) 484 kB
  ALP (brotli) 454 kB

The pixel reordering and prediction logic of ALP process saves 30 kB as compared to the output of the simple MED predictor. The ALP process saves 20 kB as compared to a PNG. Keep in mind that the ALP process presented here is not a full image codec, it is just a predictor. Additional compression improvements would be possible using contexts and error modeling like the JPEG-LS codec.

Conclusions

The ALP process is a real improvement over simple row by row processing with a predictor like MED. The results demonstrated here on the standard Lena test image show that 3 color (R G B) images can be iterated over reasonably quickly and that the min matrix iteration reordering results in real compressed file size savings. Additional work would be required to turn the ALP process into a full image codec, but the process shows promise.

The C++ source code provided here is a useful learning resource and a good first step in the process of using academic research to create actual useful tools for people. There is no reason that academic research cannot be applied to real problems using Open Source software.