C++ impl of GVM clustering

Aug 18, 2015

GVM is a fast clustering algorithm originally written by Tom Gibara. The original Java implementation can be found Here.

Unless you are a big data geek then it is unlikely that you have worked with a clustering algorithm before. That is a shame because clustering is really powerful. Clustering is also really slow! For example, try running K-means clustering on the 148279 unique pixels in Lenna.png. Processing will take hours to run. Basically, most clustering approaches are so slow as to make them unusable in image processing. Readers interested in detailed background on clustering should read this Clustering PDF.

The focus of this post is a very fast clustering implementation called GVM which has now been ported to C++. The original GVM implementation was written in Java and it is very useful, but I needed a C++ implementation in order to incorporate pixel sorting into an existing non-Java program. The full C++-11 source code and examples of GVM usage can be found on github.

What is pixel sorting? The best way to explain is to just show some examples. This example image is serrano.png, it is a well known example image in the graphics community.

    Sun Image

The serrano image is sparse meaning that unique pixels are not densely packed in terms of the 3D colorspace. If one were to extract all the pixels and order them by int order that would look like the following:

    Sun Image

There is clearly some sort of relationship between the pixels in int sorted order, but the usefulness is not directly obvious. Clustering pixel values as a means to sort by color would be more useful.

Clustering as a 3D sort method

The really tricky part of sorting by RGB pixel color is that the RGB values represent a colorspace in 3 dimensions. Sorting the pixels is a rearrangement in 2D in a way that captures the 3D relationships between color components. The GvmCpp example project contains an XCode example project that is able to read a PNG image and generate a cluster sorted output. For the serrano image, the pixels broken into clusters would look like the following:

    Sun Image

The image above shows serrano pixels clustered into N = 256 clusters. Each cluster contains the other pixels closest to the cluster centers sorted from darker to lighter pixels. These cluster sorted pixels can be combined together to form a final sorted output like the following:

    Sun Image

The sorted order basically flattens out each cluster so that the final output image is generally sorted from dark to light. Pixel colors nearer to each other in 3D space are near each other in the 2D sorted order. Note for example how the light blues appear in the same line in the output image.

Lenna example

This next example shows the Lenna.png image, another well known image in the graphics community. Gvm clusters the pixels in this image in under a second.

    Sun Image

When this image is processed into sorted clusters, the output is the following:

    Sun Image

With clusters combined back into a simple 2D image, the sorted pixels are the following:

    Sun Image

How is color sorting useful?

Sorting pixels based on clustering in 3D space has a number of applications including image segmentation, data compression, and image filtering. But without a very fast clustering implementation, one would have a tough time even getting started. This GVM in C++ example provides a high performance Open-source implementation with very low memory requirements.