cs344 ยป

Development without GPU

So you want to run the code on your machine but you don't have a GPU? Or maybe you want to try things out before firing up your AWS instance? Here I show you a way to run the CUDA code without a GPU.

Note: this only works on Linux, maybe there are other alternatives for Mac or Windows.

Ocelot lets you run CUDA programs on NVIDIA GPUs, AMD GPUs and x86-CPUs without recompilation. Here we'll take advantage of the latter to run our code using our CPU.


You'll need to install the following packages:

  • C++ Compiler (GCC)
  • Lex Lexer Generator (Flex)
  • YACC Parser Generator (Bison)
  • SCons

And these libraries:

  • boost_system
  • boost_filesystem
  • boost_serialization
  • GLEW (optional for GL interop)
  • GL (for NVIDIA GPU Devices)

With Arch Linux, this should go something like this:

pacman -S gcc flex bison scons boost glew

On Ubuntu it should be similar (sudo apt-get install flex bison g++ scons libboost-all-dev). If you don't know the name of a package, search for it with 'apt-cache search package_name'.

You should probably install LLVM too, it's not mandatory, but I think it runs faster with LLVM.

pacman -S llvm clang

And of course you'll need to install CUDA and the OpenCL headers. You can do it manually or using your distro's package manager (for ubuntu I belive the package is called nvidia-cuda-toolkit):

pacman -S cuda libcl opencl-nvidia

One last dependency is Hydrazine. Fetch the source code:

svn checkout http://hydrazine.googlecode.com/svn/trunk/ hydrazine

Or if you're like me and prefer Git:

git svn clone -s http://hydrazine.googlecode.com/svn/ hydrazine

And install it like this (you might need to install automake if you don't have it already):

cd hydrazine
automake --add-missing
sudo make install


Now we can finally install Ocelot. This is where it gets a bit messy. Fetch the Ocelot source code:

svn checkout http://gpuocelot.googlecode.com/svn/trunk/ gpuocelot

Or with Git (warning, this will take a while, the whole repo is about 1.9 GB):

git svn clone -s http://gpuocelot.googlecode.com/svn/ gpuocelot

Now go to the ocelot directory:

cd gpuocelot/ocelot

And install Ocelot with:

sudo ./build.py --install


Sadly, the last command probably failed. This is how I fixed the problems.

Hydrazine headers not found

You could fix this adding an include flag. I just added a logical link to the hydrazine code we downloaded previously:

ln -s /path/to/hydrazine/hydrazine

Make sure you link to the second hydrazine directory (inside this directory you'll find directories like implementation and interface). You should do this in the ocelot directory where you're running the build.py script (gpuocelot/ocelot).

LLVM header file not found

For any error that looks like this:

llvm/Target/TargetData.h: No such file or directory

Just edit the source code and replace it with this header:


The LLVM project moved the file.

LLVM IR folder "missing"

Similarly, files referenced by Ocelot from the "IR" package were moved (LLVM 3.2-5 on Arch Linux). If you get an error about LLVM/IR/LLVMContext.h missing, edit the following files:


and replace the includes at the top of each file for LLVM/IR/LLVMContext.h and LLVM/IR/Module.h with LLVM/LLVMContext.h and LLVM/Module.h, respectively.

PTXLexer errors

The next problem I ran into was:

.release_build/ocelot/ptxgrammar.hpp:351:14:error:'PTXLexer' is not a member of 'parser'

Go ahead, open the '.release_build/ocelot/ptxgrammar.hpp' file and just comment line 355:

/* int yyparse (parser::PTXLexer& lexer, parser::PTXParser::State& state); */

That should fix the error.

boost libraries not found

On up-to-date Arch Linux boxes, it will complain about not finding boost libraries 'boost_system-mt', 'boost_filesystem-mt', 'boost_thread-mt'.

I had to edit two files:

  • scripts/build_environment.py
  • SConscript

And just remove the trailing -mt from the library names:

  • boost_system
  • boost_filesystem
  • boost_thread

Finish the installation

After those fixes everything should work.

Whew! That wasn't fun. Hopefully with the help of this guide it won't be too painful.

To finish the installation, run:

sudo ldconfig

And you can check the library was installed correctly running:

OcelotConfig -l

It should return -locelot. If it didn't, check your LD_LIBRARY_PATH. On my machine, Ocelot was installed under /usr/local/lib so I just added this to my LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

Here's the link to the installation instructions.

Running the code with Ocelot

We're finally ready enjoy the fruits of our hard work. We need to do two things:

Ocelot configuration file

Add a file called configure.ocelot to your project (in the same directory as our Makefile and student_func.cu files), and copy this:

    ocelot: "ocelot",
    trace: {
        database: "traces/database.trace",
        memoryChecker: {
            enabled: false,
            checkInitialization: false
        raceDetector: {
            enabled: false,
            ignoreIrrelevantWrites: false
        debugger: {
            enabled: false,
            kernelFilter: "",
            alwaysAttach: true
    cuda: {
        implementation: "CudaRuntime",
        tracePath: "trace/CudaAPI.trace"
    executive: {
        devices: [llvm],
        preferredISA: nvidia,
        optimizationLevel: full,
        defaultDeviceID: 0,
        asynchronousKernelLaunch: True,
        port: 2011,
        host: "",
        workerThreadLimit: 8,
        warpSize: 16
    optimizations: {
        subkernelSize: 10000,

You can check this guide for more information about these settings.

Compile with the Ocelot library

And lastly, a small change to our Makefile. Append this to the GCC_OPTS:

GCC_OPTS=-O3 -Wall -Wextra -m64 `OcelotConfig -l`

And change the student target so it uses g++ and not nvcc:

student: compare main.o student_func.o Makefile
    g++ -o hw main.o student_func.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(GCC_OPTS)

I just replaced 'nvcc' with 'g++' and 'NVCC_OPTS' with 'GCC_OPTS'.

make clean

And that's it!

I forked the github repo and added these changes in case you want to take a look.

I found this guide helpful, it might have some additional details for installing things under ubuntu and/or manually.

Note for debian users

I successfully installed ocelot under debian squeeze, following the above steps, except that I needed to download llvm from upstream, as indicated in the above guide for ubuntu.

Other than that, after fixing some includes as indicated (Replacing 'TargetData.h' by 'IR/DataLayout.h', or adding '/IR/' to some includes), it just compiled.

To build the student project, I needed to replace -m64 by -m32 to fit my architecture, and to make the other indicated changes.

Here are my makefile diffs:

$ git diff Makefile
diff --git a/HW1/student/Makefile b/HW1/student/Makefile
index b6df3a4..55480af 100755
--- a/HW1/student/Makefile
+++ b/HW1/student/Makefile
@@ -22,7 +22,8 @@ OPENCV_INCLUDEPATH=/usr/include

 OPENCV_LIBS=-lopencv_core -lopencv_imgproc -lopencv_highgui


 # On Macs the default install locations are below    #
@@ -36,12 +37,12 @@ CUDA_INCLUDEPATH=/usr/local/cuda-5.0/include

-NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64
+NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m32

-GCC_OPTS=-O3 -Wall -Wextra -m64
+GCC_OPTS=-O3 -Wall -Wextra -m32 `OcelotConfig -l` -I /usr/include/i386-linux-gn

 student: compare main.o student_func.o Makefile
-       $(NVCC) -o hw main.o student_func.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) 
+       g++ -o hw main.o student_func.o -L $(OPENCV_LIBPATH) $(OPENCV_LIBS) $(GC

 main.o: main.cpp timer.h utils.h HW1.cpp
        g++ -c main.cpp $(GCC_OPTS) -I $(CUDA_INCLUDEPATH) -I $(OPENCV_LIBPATH)

I'm using cuda toolkit 4.2.

I don't know why, but it was necessary to add /usr/lib/gcc/i486-linux-gnu/4.4 to the PATH for nvcc to work:

export PATH=$PATH:/usr/lib/gcc/i486-linux-gnu/4.4

Eclipse CUDA plugin

This is probably for another entry, but I used this guide to integrate CUDA into Eclipse Indigo.

The plugin is University of Bayreuth's Eclipse Toolchain for CUDA compiler

Good luck!