Grayscale Image Compression w Zpaq

May 11, 2015

Unless you are a big fan of data compression algorithms, it is likely that you have never heard of zpaq. That is a shame because zpaq is really impressive technology. Have you seen the show Silicon Valley? Data compression is one of the key elements of the show. The show even has a website piedpiper.com that looks legit enough that someone might try to invest in it! Well zpaq is the real deal, it is a general purpose prediction based compression technology that is able to significantly reduce the stored size of different kinds of data.

The focus of this post is lossless 2D grayscale image compression, this is a specific kind of data compression that deals with 2D image data organized into rows and columns of pixels. Lossless means that image data is stored without loss (like PNG), as opposed to a lossy image storage format (like JPEG). A number represents each pixel (color) in an image and each pixel is represented by an integer number in the range 0 to 255.

    Sun Image


Start out by assuming that the (X,Y) coordinate for the top left corner is (0,0). The values at (0,0) (0,1) (1,0) and (1,1) would be (200 202 200 203). How could these values be compressed?

Compression works by finding "regions" of data where a specific property is very much alike from one number to the next. In this case, the four values (200 202 200 203) are all very near the value 200. So, a very simplified approach to compression would be to subtract 200 from each value so that the resulting values (0 2 0 3) are all a lot smaller. Even a very simplified form of storage would use fewer bits of data to store these 4 numbers because each value in the range (0,7) can be represented by 3 bits while each value in the range (0, 255) requires 8 bits.

So, how does zpaq compression work? Well, it is a lot more complex than just finding an average or floor value for a region as described above. For 2D image data, the most effective approach is to generate predictions on a pixel by pixel basis. See the wikipedia article on JPEG-LS for background info that describes how the lossless JPEG-LS standard implements prediction and residual coding. The basic idea is that the best prediction of a value for a pixel can be generated by looking at the pixel directly to the left and the pixel directly above. In the following image, the pixel to the left is labeled L while the pixel directly above is labeled U.

    Sun Image

A 3 causal neighbor prediction approach would make use of the values (L,U,UL) while a 4 neighbor approach would also include UR. The zpaq based grayscale compression approach described here will make use of (L,UL,U,UR) neighbor pixels in order to predict the next pixel indicated by X. The zpaq library provides software that implements prediction and residual coding but you the developer need to provide the logic that generates predictions based on input values.

How to get started?

Well, that is the hard part. To be blunt, zpaq is really hard to work with. You have to dive in and bang your head against the wall to learn how things work. Learning zpaq can be very time consuming because there are so many moving parts.

The zpaq library and the zpaq and zpaqd command line utility programs are open source and are not covered by patents. It is critically important that new technology be made available to the public as open source software that a developer can adapt and change as needs change. The history of image compression technology is a sad wasteland of technology that never caught on. While the crazy patent system in the U.S. is partly to blame, the simple fact that researchers do not routinely publish software that implements research approaches is a serious problem. One can easily find hundreds of research papers on image compression, but finding actual working software is exceptionally difficult.

Matt Mahoney originally created paq which would later become zpaq and released source code that implemented the technology. The resulting collaboration and improvements to the original paq software evolved into zpaq which is currently the most effective compression technology available.

The zpaq command line utility is an archiver like zip. Here is a quick example of how to use zpaq on the command line.

> zpaq add test.zpaq timing.csv
zpaq v7.05 journaling archiver, compiled May  5 2015
Adding 0.002184 MB in 1 files -method 14 -threads 4 at 2015-05-10 03:36:47.
+ timing.csv 2184
[1..1] 2196 -method 14,188,0
1 +added, 0 -removed.
0.000000 + (0.002184 -> 0.002184 -> 0.001853) = 0.001853 MB

> rm timing.csv
> zpaq extract test.zpaq timing.csv
zpaq v7.05 journaling archiver, compiled May  5 2015
test.zpaq: 1 versions, 1 files, 1 fragments, 0.001853 MB
Extracting 0.002184 MB in 1 files -threads 4
[1..1] -> 2196
> timing.csv
0.009 seconds (all OK)

The zpaq command line archiver can be used to extract any previously created archive, but to encode with a zpaql config the zpaqd command line utility should be used. In order to use zpaqd one would also need a zpaql config file that defines how to compress a certain type of file. For example, download bmp_j4c.zip and extract the bmp_j4.cfg and colorpre.cfg files in order to compress a .bmp file via the following commands:

> convert kodim23.png kodim23.bmp
> zpaqd c bmp_j4.cfg kodim23.bmp.zpaq kodim23.bmp
Appending kodim23.bmp.zpaq at 0
kodim23.bmp.zpaq 1179786 -> 285573 (0 errors)
7.29 seconds
The output above shows some flat out amazing compression made possible using the BMP zpaql config file. The original PNG was about 557 kB while the zpaq compressed BMP is only 285 kB.

Simple Grayscale

At this point, the most simple 2D grayscale compression using zpaq will be presented. A zpaql config file tells zpaqd how to compress a specific kind of input file. Creating an effective zpaql config file is really difficult and a lot of trial and error will be required. Starting simple and improving things step is really the only way to make progress.

For testing purposes, a 2 x 2 grayscale image is created with the hex grayscale values 0x65 0x8c 0x6e 0x67. This test image can be downloaded as a PGM file here. The following ffmpeg command can be used to strip all headers from the PGM and write only the grayscale bytes to a .yuv file:

> ffmpeg -i gray2x2.pgm -vcodec rawvideo gray2x2.yuv
> hexdump -C gray2x2.yuv
00000000  65 8c 6e 67    |e.ng|

So now gray2x2.yuv has been created as a file that is 4 bytes long. Each byte in the file is a grayscale pixel value and for the sake of simplicity there is no header before the grayscale bytes.

Here is the most simple zpaql config file that will process these grayscale bytes. Note that any text inside parens is treated as a comment. Also note here that L indicates the pixel to the left of the predicted pixel. U and the other neighbor pixels are not used in this simplified config.

> cat gray1.cfg
( simple grayscale zpaql configuration file )

(
UL U UR
L  X
)
  
(12 bit history buffer = 4096 grayscale bytes
 or 2 lines of width 2048)
  
comp 5 12 0 0 2 (hh hm ph pm n)
  0 cm 16 255 (L)
  1 mix 11 0 1 16 255
  
hcomp
  c++ *c=a b=c (save in rotating buffer M)
  
  ( d[0] = hash based on L )
  a=0
  hash
  d=0
  *d=a
  
  ( d[1] = mix hash )
  d++
  *d= 0
  
  halt
end

The zpaql configuration presented above has a single component model and a mix that takes just the one input. This will not provide great compression but the example is a useful introduction to zpaql. This config can be run with zpaqd as follows:

> zpaqd c gray1.cfg gray2x2_gray1.zpaq gray2x2.yuv
Appending gray2x2_gray1.zpaq at 0
gray2x2.yuv 4 -> 96
gray2x2_gray1.zpaq 4 -> 97 (0 errors)
Memory utilization:
0 cm 16 255: 40/65536 (0.06%)
1 mix 11 0 1 16 255: 0/2048 (0.00%)

Looking at the output of zpaqd shows that the zpaql config file was read and the prediction/compression engine was initialized and run without error. The engine uses a hash buffer that is 2^5 words in length and a history buffer that is 2^12 bytes in length. The sizes of the hash buffer and the history buffer are the first two values after comp in the config input comp 5 12. This information is also in the ZPAQ Standard Format PDF linked above.

A zpaql config file is written in a low level assembly like language with 4 single word sized registers a, b, c, d and instructions that are like C operators. In addition to the registers the hash buffer and history buffer are implicitly defined. C code that represents how these buffers would be allocated might look like:

// d[i] for hash (word) buffer
// b[i] or c[i] for history (byte) buffer
uint32_t H[32];
uint8_t  M[4096];

A very simple zpaql example of writing a word containing the value zero to the first word in the hash buffer would look like the following:

( zpaql for C code: H[0] = 0 )
  a=0
  d=0
  *d=a

In the simplified config one will note the hash pseudo instruction is inline replaced with code that does the following:


  a = (a + *b + 512) * 773

The hash instruction will read a byte from the input buffer M and then add and multiply and finally store the result back into register a. Since b=c has already set the b register to the offset at the start of the history buffer, the result of the hash instruction is to read the grayscale value to the left of the pixel to be predicted and generate a 32 bit hash based on that value. This hash value is then written into the first word of the hash buffer. The hash for the second word in the hash buffer, which corresponds to the mix hash, is set to zero just to keep things simple.

While the code looks correct, one must actually inspect the results of executing the zpaql instructions to verify that the code is free of bugs. Luckily, there is a hcomp execution mode available as a zpaqd command that makes this possible. Each input argument is an input byte and in this case each is prefixed by an x. The x prefix means that each input is seen as a hex value and hex values are displayed for the registers.

> zpaqd t gray1.cfg h x65 x8c x6e x67
...
(output contains register and memory dump after
  each of the 4 input bytes is processed)
...

The output shows register values before and after each zpaql instruction is executed. Finally, after all instructions have been run for a specific byte of input, the contents of the H and M buffers are displayed. For example, after all 4 input bytes have been processed the history buffer dump looks like the following:


M (size 4096) = (rows of all 0 omitted)
  0: 00 65 8C 6E  67 00 00 00  00 00 00 00  00 00 00 00

Note that the input hex character values x65 x8c x6e x67 are offset by 1 in the resulting history buffer. The offset is because the zpaql config started with c++ which incremented the pointer into the history buffer so that a zero byte appears in M[0]. There is no specific reason a zpaql would have to start with c++ or have a leading zero in the history buffer.

A quick look at the hash array output after the very first byte has been processed should verify that the hash instruction and associated zpaql logic that generates a hash based on the L neighbor is working properly.


H (size 32) = (rows of all 0 omitted)
  0: 00073AF9 00000000 00000000 00000000

The hex hash value 0x73AF9 was indeed generated by reading the value 0x65 from b and then writing the contents of register a to the first position in the hash buffer. The second hash value is also properly set to zero.

Constructing a zpaql module is quite difficult since the code must be written and then debugged to be sure the logic is correct before one can even evaluate if specific component and mixer configurations result in good compression. The zpaqd trace command described above at least makes it possible to debug instructions line by line. It is also important to develop small chunks of code and debug with different test cases using trace before integrating that code into larger more complex models.

Simple Model Results

Running the simple model on real input is the next logical step. This example will make use of the famous Lena image. Download lena512.pgm, strip the header with ffmpeg, and then compress with the simple zpaq model:

> ffmpeg -i lena512.pgm -vcodec rawvideo lena512.yuv
...
> zpaqd c gray1.cfg lena512_yuv_gray1.zpaq lena512.yuv
Appending lena512_yuv_gray1.zpaq at 0
lena512.yuv 262144 -> 165202
lena512_yuv_gray1.zpaq 262144 -> 165203 (0 errors)
Memory utilization:
0 cm 16 255: 20374/65536 (31.09%)
1 mix 11 0 1 16 255: 216/2048 (10.55%)

The original grayscale bytes without any compression are 262 kB in size. An optimized PNG of this lena image is about 150 kB, so compression for even this very simple model is not too bad at 165 kB in size.

Add 2 da Mix

A second context model will now be added into the mix, so that the mixer actually has something to do. The new context model will be the same type, but instead of hashing on the L pixel the LL value to the left of L will be used. When making modifications to a config file a good practice is to copy the config file and then increment the number at the end of the filename like so:

> cp gray1.cfg gray2.cfg
(add component and zpaql hash code)
> cat gray2.cfg
( simple grayscale zpaql configuration file )

(
UL U UR
L  X
)

(12 bit history buffer = 4096 grayscale bytes or 2 lines of width 2048)

comp 5 12 0 0 3 (hh hm ph pm n)
  0 cm 16 255 (L)
  1 cm 16 255 (LL)
  2 mix 11 0 2 16 255

hcomp
  c++ *c=a b=c (save in rotating buffer M)

  ( d[0] = hash based on L )
  a=0
  hash
  d=0
  *d=a

  ( d[1] = hash based on LL )
  a=0
  b--
  hash
  d++
  *d=a

  ( d[2] = mix hash )
  d++
  *d= 0

  halt

end

It can be very useful to run a diff at this point to see what changed between the two configs:

> diff -u gray1.cfg gray2.cfg
--- gray1.cfg   2015-05-09 22:53:44.000000000 -0700
+++ gray2.cfg   2015-05-10 01:24:04.000000000 -0700
@@ -7,9 +7,10 @@
  
(12 bit history buffer = 4096 grayscale bytes or 2 lines of width 2048)
  
-comp 5 12 0 0 2 (hh hm ph pm n)
+comp 5 12 0 0 3 (hh hm ph pm n)
0 cm 16 255 (L)
-  1 mix 11 0 1 16 255
+  1 cm 16 255 (LL)
+  2 mix 11 0 2 16 255
...

Adding the new context model required updating the total component count (n), updating the input range to the mixer, and also adding zpaql code that would set a hash value for the second context model.

The zpaql code that writes the second hash buffer word generates a hash based on the second to last grayscale pixel value by subtracting one from the register b before doing the hash operation for d[1]. With this additional context now in the mix, the compression results are:

> zpaqd c gray2.cfg lena512_yuv_gray2.zpaq lena512.yuv
Appending lena512_yuv_gray2.zpaq at 0
lena512.yuv 262144 -> 163614
lena512_yuv_gray2.zpaq 262144 -> 163615 (0 errors)
Memory utilization:
0 cm 16 255: 20374/65536 (31.09%)
1 cm 16 255: 26526/65536 (40.48%)
2 mix 11 0 2 16 255: 432/4096 (10.55%)

So, that is a slight improvement in that the compressed file is now 163 kB instead of 165 kB. The addition of a second context model really should have improved the results more than that.

The root of the problem here is that adding a context model for LL is not too effective because the LL neighbor is getting farther away from the pixel being predicted. The UL or U neighbors would both be better predictors than LL, but to be able to access those pixels a couple of tricky problems have to be solved. The zpaql code has to be able to determine the width of the image in order to be able to look backwards in the history buffer to find the UL or U neighbors.

Into the 2nd Dimension

One dimensional prediction with one or two pixels is not effective because it is not taking advantage of the 2D nature of images. For a specific image pixel, the best predictors are the immediate neighbor pixels. So, the next task is to figure out how to access 2D image data so that the UL or U pixels can be read in zpaql code.

This task sounds straightforward but it ends up being really difficult to implement properly. Previously, the headers before the image pixels were being stripped off and only the grayscale pixel bytes were processed. Stripping off the headers makes it impossible to read the image width, so the headers have to be left intact. The problem with leaving the headers in the data to be processed is that one then has to write very complex zpaql code to parse the headers.

In addition, there is a nasty little issue that is not at all obvious and results in zpaql code failing to parse known constant offsets in headers properly when zpaql code is executed while building an archive as opposed to running as part of a zpaqd trace invocation. Instead of building zpaql code to parse very complex headers a better approach will be described here. A generic header parsing logic would look for a known 4 byte marker 0x42414245 aka "BABE" in the header. A pair of 16 bit little endian width and height values would appear after this known marker pattern and then the grayscale pixel data would appear in the stream. Since this set of 4 marker bytes could appear at byte 0 or 1 or some later offset, it completely avoids the problem of a fixed file offset getting messed up in the case that a pcomp section emits 1 or more padding bytes when building the real zpaq archive.

A whole bunch of details here will be skipped over and the results that actually do the parsing in zpaql code are presented along with a script to convert an existing grayscale image into this custom .babe format. In addition, the zpaql config file is getting large, so it is included with as a link instead of being included inline.

Here are a couple of different implementations that will convert an existing .pgm file to a .babe file that can be parsed by this zpaql config.

> python encode_babe.py lena512.pgm lena512.babe
lena512.pgm lena512.babe
encoding "lena512.pgm" 512 x 512
wrote "lena512.babe" as 262152 bytes

The following link is the zpaql config file that parses a .babe file using a context model for the L neighbor and a second context model for the U neighbor.

Encoding lena512.babe with this config generates much better results:

> zpaqd c gray3.cfg lena512_babe_gray3.zpaq lena512.babe
Appending lena512_babe_gray3.zpaq at 0
lena512.babe 262153 -> 146304
lena512_babe_gray3.zpaq 262153 -> 146305 (0 errors)
Memory utilization:
0 cm 16 255: 20388/65536 (31.11%)
1 cm 16 255: 16314/65536 (24.89%)
2 mix 11 0 2 16 255: 440/4096 (10.74%)

This simple 2 context config generates an image that is 146 kB which is about 20 kB smaller than the 165 kB encoded size that uses only one context. These results indicate that mixing L and U contexts does significantly better than the previously attempted approach of mixing L and LL. The results are also already better than the best PNG encoded size of 150 kB.

Hash it, hash it good!

With only 2 contexts zpaq is already producing good compression results. At this point, one might be tempted to add additional contexts so that the mix has more input to work with. But there is a downside, each additional context adds significant CPU time to the encode/decode process.

Zpaq is symmetric, meaning that it takes the same amount of time to decode an image as it does to encode it. Each of the 8 bits of every grayscale byte value is predicted one at a time and that can be a costly in terms of CPU usage. The bit prediction and update steps are the most expensive part of the encode/decode process. For the moment, further improvements to the mixer logic can improve compression results without significantly increasing the CPU usage.

In the previous config, the mixer logic was hard coded to zero for every pixel. The effect of this was to use the same mixer configuration "slot" for every pixel. The zpaq mixer predicts a pixel and then adjusts the results of the previous mixer weights based on how well each mixer input was able to predict the next pixel. Using the same mixer "slot" for every pixel is not an awful approach, but it does not take advantage of the differences between noisy edge regions and very smooth regions.

A very simple threshold operation on horizontal and vertical pixel deltas can be implemented to detect smooth regions:


dH = abs(U - UL)
dV = abs(L - UL)
C = 16
if (dH + dV) < 16
  H[2] = 0
else
  H[2] = 255

The pseudo code above was implemented as zpaql code. The impelemtation of the mixer logic is a little complex because zpaql does not provide a abs() function and that logic had to be implemented along with detection of negative numbers.

Encoding lena512.babe with this improved smooth region mixer logic:

> zpaqd c gray4.cfg lena512_babe_gray4.zpaq lena512.babe
Appending lena512_babe_gray4.zpaq at 0
lena512.babe 262152 -> 144401
lena512_babe_gray4.zpaq 262152 -> 144402 (0 errors)
Memory utilization:
0 cm 16 255: 20386/65536 (31.11%)
1 cm 16 255: 16310/65536 (24.89%)
2 mix 11 0 2 16 255: 859/4096 (20.97%)

This simple mixer modification produced a compressed file that is 144 kB as opposed to 146 kB with the mixer hard coded to zero. That is a useful improvement in the compression ratio without a significant increase is complexity.

Add a constant, why not?

The previous config reworked the mixer logic to improve the compression. Another very simple and cheap modification is to add a constant input to the mixer. A constant input is basically free and it can improve performance a little bit.

Encoding lena512.babe with the additional constant input to the mixer:

> zpaqd c gray5.cfg lena512_babe_gray5.zpaq lena512.babe
Appending lena512_babe_gray5.zpaq at 0
lena512.babe 262152 -> 143930
lena512_babe_gray5.zpaq 262152 -> 143931 (0 errors)
Memory utilization:
0 cm 16 255: 20386/65536 (31.11%)
1 cm 16 255: 16310/65536 (24.89%)
2 const 160
3 mix 11 0 3 16 255: 1300/6144 (21.16%)

The constant shaved about 500 bytes off the final compressed file size. Why does this work? I don't know. I found this constant in the BMP config and it seems to work well. Compression is kind of a black art and you just have to try things to see if it is useful.

Best compression results?

The results presented so far beat the best available PNG compression by 6 kB. Config 5 uses a fairly simple set of components that do not require an excessive amount of CPU to decode. CPU usage is a concern with zpaq because if a large number of components are used then the CPU usage increases quite a bit. This effect can be seen by encoding the same image with the BMP model modified to skip G-R colorspace conversion:

> convert lena512.pgm lena512.bmp
> zpaqd c bmp.cfg lena512_bmp.zpaq lena512.bmp
Appending lena512_bmp.zpaq at 0
lena512.bmp 786554 -> 133727
lena512_bmp.zpaq 786554 -> 133728 (0 errors)

The BMP config is able to squeeze another 10 kB out of the same image. That is impressive but there is a real cost in terms of CPU usage which can be seen when decoding the image:


> mkdir tmp
> cd tmp
> zpaq extract ../lena512_bmp.zpaq
zpaq v7.05 journaling archiver, compiled May 10 2015
../lena512_bmp.zpaq: 1 versions, 1 files
Extracting 0.786554 MB in 1 files
[1..1] -> 786554
> lena512.bmp
4.244 seconds (all OK)


> zpaq extract ../lena512_babe_gray5.zpaq
zpaq v7.05 journaling archiver, compiled May 10 2015
../lena512_babe_gray5.zpaq: 1 versions, 1 files
Extracting 0.262152 MB in 1 files
[1..1] -> 262152
> lena512.babe
0.161 seconds (all OK)

This is the most extreme comparison possible, but it shows that saving an additional 10 kB for this image took over 30 times more CPU time using the BMP model. Depending on your application this kind of trade off may or may not be useful. These timing numbers were also generated on a desktop machine, so the results on embedded CPUs will be significantly slower to decompress.

Additional Compression

The compression configs presented so far are intended to illustrate how zpaql works and provide good results while also keeping the CPU usage at a bare minimum. For example, config 5 would be useful for very large images where CPU usage is a real concern. This final config 6 is based on the approaches presented already except that 4 cm models and 3 icm models are used. This final config trades more CPU usage for better compression results on a wide variety of grayscale images.

Encoding lena512.babe with config 6:

> zpaqd c gray6.cfg lena512_babe_gray6.zpaq lena512.babe
Appending lena512_babe_gray6.zpaq at 0
lena512.babe 262152 -> 138320
lena512_babe_gray6.zpaq 262152 -> 138321 (0 errors)
Memory utilization:
0 cm 16 255: 20386/65536 (31.11%)
1 cm 16 255: 22688/65536 (34.62%)
2 cm 16 255: 16310/65536 (24.89%)
3 cm 16 255: 21564/65536 (32.90%)
4 icm 10: 18202/65536 (27.77%)
5 icm 9: 18761/32768 (57.25%)
6 icm 8: 12282/16384 (74.96%)
7 const 160
8 mix 11 0 8 16 255: 3478/16384 (21.23%)

With config 6 the final compressed file size is a very impressive 138 kB. That is not quite as small as the BMP config results, but the CPU usage is significantly less.

> zpaq extract ../lena512_babe_gray6.zpaq
zpaq v7.05 journaling archiver, compiled May 10 2015
../lena512_babe_gray6.zpaq: 1 versions, 1 files
Extracting 0.262152 MB in 1 files
[1..1] -> 262152
> lena512.babe
0.252 seconds (all OK)

On desktop hardware config 6 takes about 1/10 of a second longer than config 5 to decompress, but it provides additional compression saving 6 kB of space.

Final Comparison

This post covered a wide array of topics related to data compression and 2D image compression. Code written in zpaql was provided as an introduction to the zpaql language. Six working zpaql configs were presented and the compression efficiency and CPU usage of different approaches were compared. A final comparison is presented here of the grayscale Lena image encoded with different well known lossless image compression methods.

  • 150 kB PNG : PNGCrush -brute
  • 150 kB felics
  • 149 kB sfalic
  • 139 kB JPEG-LS
  • 138 kB zpaq : gray6 module
  • 135 kB calic
  • 134 kB zpaq : BMP module
  • 128 kB mrp : with -o option

While calic does well on this specific test image, in many cases the gray 6 module generates the best available grayscale compression results with realistic CPU usage. The zpaq-BMP and mrp compression approaches are unrealistic as they are very computationally expensive.

Finally, note that the source code for each zpaql config presented here is released under the terms of the 2 clause BSD license.