OpenCV 3D Histogram based image segmentation

Feb 27, 2015

Image Segmentation. It is really really hard!

What is image segmentation? Basically image data is a collection of pixels that represent different colors. Segmentation splits the pixels in an image up into groups of pixels based on how alike the colors are. The following example shows a grayscale image and how this image might be segmented via assignment of colors to pixels that are very much alike:

    Gray Head     Color Head

Just looking at the image on the left it is easy to see how some regions are more alike than other regions in terms of lightness and darkness. The visual system that has evolved in creatures here on earth is really impressive. A human is able to look at an image and instantly "know" where the region bounds are located in a 2D sense. But, how does one program a computer to do this kind of task?

Well, it turns out that image segmentation is not easy. But the very impressive OpenCV library provides some great tools that at least make it possible to get started. This post describes an example implementation of automated image segmentation approach built using OpenCV and C++.

Why is automated image segmentation important? There are many types of image segmentation methods that one can read about online. One can find all sorts of mathematical proofs in papers, but what about actual useful programs? Well, not so much.

How do you define useful?

Ahh, the really tricky question. Useful to whom? Much of the literature and implementations of image segmentation is focused on one specific kind of segmentation useful for a specific purpose. The problem with that approach is that useful for one purpose may not be directly useful for another purpose. The specific problem with many research approaches is that instead of building a general purpose solution, researchers just punt and declare that the user should adjust specific parameters to make the approach useful for a specific image.

How does one define segmentation input parameters? One has to run the segmentation approach and look at the results and then try out different parameters and then decide on parameter values that work best for that specific image. But, this is a chicken and the egg problem and it entirely defeats the purpose of automated image segmentation. The goal is to have the computer examine the image and figure out how to segment the image by itself.

To illustrate this point, note how adjusting the input parameters of Felzenszwalb's method result in drastically different segmentations:

Seg1 Seg2 Seg3


So, what is the perfect solution? Well there really is not one perfect solution. What is provided here is a general purpose method of automated segmentation via region merging. Segmentation in this case is not oversegmentation, the difference between the two is illustrated with this color bars image.


The oversegmented version of this image with segmented regions drawn on top of the image is:


An oversegmented image is the kind of result that one would get with approaches like SLIC, Turbopixels, and QuickShift. This kind of oversegmentation breaks blocks of pixels up into regional groupings in 2D space. What would be more useful would be a segmentation that put those groupings back together to form larger superpixels that contain very alike pixels.

The oversegmented regions image might look like the following. Note that the specific color used to define a region does not matter. These arbitrary colors are known as region tags.


These regions should be merged to produce a tags image like this:


Oversegmentation + Merging

In order to oversegment and then merge, one first needs an oversegmentation solution. I looked at a bunch of them and none is quite as good as Seeds. The Seeds C++ impl by Stutz provides a best of breed implementation that is both Open Source and very fast. In addition to Seeds, another oversegmentation method known as Contour-relaxed Superpixels provides an Open Source C++ implementation. The Seeds approach is much faster than Contour-relaxed Superpixels, but in many cases the segmentations produced by Contour-relaxed Superpixels are actually better due to more effective "sticking" to edges.

Now for the actual implementation of region merging. The code is implemented in C++ (C++11 required) and it is delivered as an XCode project. The XCode project contains already compiled OpenCV binaries for modern MacOSX machines so you don't have to go through the trouble of actually building OpenCV.

The project contains 3 targets, one builds the Seeds segmentation command line util. The second target builds the Contour-relaxed Superpixels command line util. The final target builds an executable named seedscombine which is the region merging program. The full source for these 3 pieces of software is included and it should not be too hard to build them on other systems that support C++11.

Command Line:

Using these programs from the command line is easy. The following example shows results of segmentation of a simple artist generated image of a hand:


$ seeds FingerNumbers3.png
reading image input.png
auto N superpixels = 1000
Image with contours saved to contours.png ...
Image with only edges saved to edges.png ...
Image with mean colors saved to mean.png ...
Image with cluser ids saved to clusters.png ...

The Seeds util writes a number of output image files, but the important one is clusters.png. This file is the tags output of the seeds segmentation and it will be passed as the input to seedscombine. The contours output of Seeds looks like this:


Note how the very large white region is oversegmented into a bunch of small regions. The clusters output image is then fed into the region merge program like this:

$ seedscombine FingerNumbers3.png \
  clusters.png combined_clusters.png
started with 676 superpixels
RGB merge edges count 118 result num superpixels 233
LAB merge edges count 128 result num superpixels 223
RGB fill backproject count 211 result num superpixels 140
RGB merge small count 299 result num superpixels 52
RGB merge edgy count 308 result num superpixels 43
LAB merge count 308 result num superpixels 43
ended with 43 superpixels
wrote combined_clusters.png

The output shows that the initial tags input contained 676 regions and that these regions were merged into 43 very alike superpixels. The combined output tags image looks like this:


Use of the Contour-relaxed Superpixels segmentation is much like using seeds except that the user needs to pass in a target superpixel segmentation size. I find that a 10x10 smoothing window gives good results but that for some very detailed images a 5x5 region sticks to edges better.

$ contourRelaxedSuperpixels FingerNumbers3.png 10 10

OpenCV 3D Histograms!

The actual implementation uses OpenCV and 3D histogram based backprojection to merge alike regions based on rough stats that attempt to adapt to the image. Typically, one would find a 1D or 2D histogram approach is used to search for a known pattern in an image. Here a full 3D histogram is calculated from the RGB pixels in a region and then backprojection is used to determine when to merge that region with other neighbor regions. This is all fairly quick since the histogram based backprojection is a fast operation even though many pixels in the image are scanned over and over. Developers are invited to take a look at the source for all the gory/awesome details. The C++ code contains a general purpose Superpixel class and utility methods to support loading tags image and processing them in various ways using OpenCV. The merge code is all BSD licensed while the superpixel segmentation sources use other Open Source licenses.

What is it good for?

What would this kind of automated segmentation approach be useful for? Well here is a specific example of a tricky problem that can be solved automatically for many cases using this approach. Assume that you have this chair image and you wanted a way to automatically grab the chair pixels out of the image but without the white background.


Well, you could mess around in photoshop attempting to determine the exact region selection setting to split the background from the chair. Or you could run it through the region merge logic so that all the background areas are combined into one region. The output tags file might look like:


Note how the alike background pixels have been merged right up to the edge of the chair. This tags image could easily be converted into a mask by selecting the largest region and then writing that out as a mask like the following:


The edge between the chair and the background will generally be good but not absolutely perfect. One could use this mask to set the alpha for the chair to chop out the chair foreground which could then be pasted as an alpha channel image over some other image.

Here is the chair on vacation at Phu Quoc Long Beach, VN!


While a skilled photoshop artist could make that look better, this approach can be implemented in a totally automated way for many images. If for example one had 1000 chairs and 1000 beaches, then the cost of having a photoshop expert do all the processing would really add up. This approach is also significantly better than the "Color To Alpha" feature found in the Gimp, as it finds the exact edge and does not modify the pixels values away from the edge region.

Enjoy the Source.