Loading an OpenGL iOS Texture Cache

Aug 23, 2013


If you have done any OpenGL coding on iOS then you have likely been exposed to 2D textures. OpenGL uses textures for a bunch of different things. For example, a 2D texture containing image data could be mapped onto the faces of a 3D cube. This post will focus only on use of a 2D texture that contains RGB image data and the texture will be rendered into a 2D GLKit based view. The texture used in this example is an iPhone 1x full screen sized image with dimensions 480x320. The texture data will be read from a series of 8 PNG image files and will be displayed one after another. A label displays the approximate frame rate of the PNG decoding operation. The animation looks like this:

    Cycle Small Animation

An animation could be implemented a bunch of different ways on iOS. In this case, OpenGL needs a texture as input to a shader, so the goal of this example is to demonstrate how to load a texture without blocking the main thread and without blocking the OpenGL render logic. This example uses the texture cache API introduced with iOS 5 as part of the CoreVideo framework. A texture cache can be used either to read pixel data out of a OpenGL texture or to write pixel data to an OpenGL texture. There is also a special way to use this texture cache API to pass an existing texture from the Camera API or from the h.264 decoder into OpenGL in a highly optimized way.

What is available?

One can easily find examples of using the texture cache API to read texture data out of OpenGL. A developer might want to do that to implement a screen capture function in the app. It is also not too hard to find example code that shows how to use the optimized loading path to move image data from the camera or from the h.264 decoder into OpenGL. Apple provides RosyWriter and other examples and the GPUImage framework makes extensive use of this functionality.

What is missing?

What is completely missing is example code that uses the CVOpenGLESTextureCacheCreate() and CVOpenGLESTextureCacheCreateTextureFromImage() APIs to feed data into an OpenGL texture. I searched and searched but could not find a single example of actual working code. This is really frustrating because while these texture cache APIs seem to be really useful, the lack of documentation and examples make for a really frustrating process of trial and error. The texture cache APIs should provide a better way to access the same functionality that glTexSubImage2D() provides, but actually using these APIs to update texture data is a lot harder than it looks. On the Simulator, writing to the texture cache memory seems to have no effect and I had to write some special code to work around the problem by making another copy of the framebuffer and explicitly uploading it to the GPU with a call to glTexSubImage2D().

Okay, enough complaining. Here is the source code for an actual working iOS example. This project is for iOS 5.0 or better and it is a universal app so it will run full screen on either an iPhone or an iPad.

RGBAShaderFromTextureCache: (Xcode Project File)

GCD Threaded Loading:

The tricky part of this implementation is how loading the texture from a PNG image source is done in a background GCD thread. A developer must take care not to block the main thread since UI operations and drawing are done on the main thread. In addition, the OpenGL render cycle must not block waiting on a possibly slow loading operation. To deal with both of these requirements, the logic implements double buffering of the texture load on a secondary GCD thread. The following pseudo-code shows the basic idea:

// Swapping Texture1 and Texture2
main thread:
  display(frame = 0)
    render Texture1
    load Texture2 in thread
  display(frame = 1)
    render Texture2
    load Texture1 in thread

In the background thread, PNG data is loading as a CGImageRef and then the CoreVideo buffer memory is turned into a bitmap ref via CGBitmapContextCreate(). The image memory is then decoded and written directly to the CoreVideo buffer via a call to CGContextDrawImage(). This is the fastest way to decode the image data and write via CoreGraphics. The code only writes the texture memory once and that is critical because a large texture takes up a lot of memory. The source code can be found in ViewController.m and OGLESObjects.m in the Xcode project.

How fast is it?

Performance is a complex subject. Faster is always better, but understanding why an approach is fast or slow is the important part. An approach can be IO bound or it can be CPU bound. In this case, the execution time of the code is completely CPU bound. IO is not a factor in this example as there are only 8 frames of animation and all encoded PNG data was read into memory on app startup.

On an iPhone4, the decoding logic runs at about 15->20 FPS (not good). On an iPad2, the decode logic runs at about 45 FPS, but performance can jump to 55 FPS depending on what else is running and how the load is shared between the 2 CPU cores. As the decode logic is CPU bound, it makes sense that the dual core A5 is able to perform so much better as compared to the single core Cortex-8 chip in an iPhone4. The results for this 1x full screen size image show that decoding a PNG on every frame is just too slow to run on slightly older iPhone hardware. Perhaps the iPhone4S or iPhone5 would be able to handle it, but the iPhone4 cannot run at an acceptable frame rate of 30FPS. Older iPhone 3GS devices would likely perform even worse and iPhone 3G devices could not even run OpenGL ES 2.0. I also tested on a retina iPod Touch and the results were the same as the iPhone4.

I hope this Xcode project provides a good example and starting point for your own experiments. While using an iOS texture cache is faster than invoking glTexSubImage2D(), this example shows that the bottleneck is in the decoding of pixel data and writing that data to memory. The bottleneck is not in transferring already written memory to the GPU, since on iOS main memory and GPU memory are one in the same. In many texture cache examples, the developer is making use of the h.264 hardware decoder to decode and write pixels to texture memory. If one is not using the h.264 hardware decoder, performance will be much slower. The texture caches just do not have that big of an impact on performance here as the bottleneck is in another part of the code.