Home > MorphCUDA

MorphCUDA

MorphCUDA is a project mainly written in C and C++, it's free.

GPU-accelerated image morphology written in C++/CUDA (100x faster than CPU!)

Binary Mask Morphology in C++ using CUDA

Ridiculously fast morphology and convolutions using an NVIDIA GPU!


Orig Author: Alan Reiner Date: 04 September, 2010 Email: [email protected]


This a BETA version of a complete morphological image processing library using CUDA. CUDA is an NVIDIA programming interfact for harnessing your NVIDIA graphics card to do computations (instead of just graphics). For problems that are highly parallelizable (like convolution or morphology) you can get speedups of several orders of magnitude.

For instance, to dilate a 4096x4096 mask with a 15x15 structuring elt, the CPU would take anywhere from 5 to 20 seconds. Using this library with an NVIDIA GTX 460, the operation takes less than 0.1 seconds! You can see nearly identical speed improvements with regular convolutions of ints or floats (this code is easily adapted to do regular convolutions).

To use this library, you will need the CUDA 3.1 toolkit, and you might have to move a few libraries into your linker path. And of course, you need a CUDA-enabled graphics card (Fermi recommended, others will prob work; more on that below).

This library is designed to be used for performing a sequence of morph operations on a single image/mask. This is standard for morphological processing, but also keeping the memory on the device saves a lot of time. The saying at my workplace is, "Copying data in and out of the device is expensive; Computation is basically free!"

This isn't to say that the library can't be used for isolated morph operations (copy in, morph, copy out)... in fact, it shouldn't have any computational disadvantages compared to other libraries (besides kernel optimization design). However, this library uses objects we call "Morphological Workbenches" which may be non-intuitive at first, but it actually simplifies the process of applying dozens, hundreds or thousands of operations to a single image/mask.

I envision this library could be useful for GIMP plugins, or ATR systems. Just remember, CUDA is NOT open-source, and the licensing will not be favorable to OSS projects.


Supported Hardware:

This code was designed for NVIDIA devices of Compute Capability 2.0+ which is any NVIDIA GTX 4XX series card (Fermi). The code should compile and run on 1.X devices, but the code is not optimized for them so there may be a huge performance hit. Maybe not. (see note below about compiling for 1.X GPUs)

I believe NVIDIA 8800 GT is the earliest NVIDIA card that supports CUDA, and it would be Compute Capability 1.0.

CUDA was created by NVIDIA, for NVIDIA. It will not work ATI cards. If you want to take up GPU-programming on ATI, the only two options I know are OpenGL and OpenCL. However, the maturity of those programming interfaces are unclear (but at least such programs can be run on both NVIDIA and ATI)


Installing and running:

This directory should contain everything you need to compile the image convolution code, besides the CUDA 3.1 toolkit which can be downloaded from the following website:

  http://developer.nvidia.com/object/cuda_3_1_downloads.html

In addition to installing the toolkit, you might need to add libcutil.a and libshrutil.a to your LD path. In linux, it is sufficient to copy them to your /usr/local/cuda/lib[64] directory.

I personally work with space-separated files for images, because they are trivial to read and write from C++. Additionally, I use MATLAB to read/write these files too, which makes it easy create input for the CUDA code, and verify output.

There is no reason this code won't work on Windows, though I've never tried it.

I strongly urge anyone trying to learn CUDA to download the CUDA SDK. It contains endless examples of every CUDA feature in existence. Once you know what to do, you can get the syntax from there, which more often than not is very ugly.


To work with older NVIDIA cards (8800+ GT, 9XXX GT, GTX 2XX):

Open commom/common.mk and around line 150, uncomment the GENCODE_SM10 line, which enables dual-compilation for CUDA Compute Capability 1.X devices. The reason I commented this out is that 1.X devices don't support printf(), which is my main method for debugging. Therefore, leaving this line in the Makefile prevents the code from compiling when I'm attempting to debug.


Planned Updates

  • Need to modify the kernel functions (or invocations) to handle masks that have arbitrary dimensions. Currently, it is assumed that the input has dimensions that are multiples of 32.