Course Resources

Lesson Slides
Lesson slides are available at Udacity's CS344 github Lecture Slides.

Problem set files
You can access and download the problem set files from Udacity's CS344 github Problem Sets.

Reference Books
CUDA by Example: An Introduction to General-Purpose GPU Programming
Professional CUDA C Programming
Programming Massively Parallel Processors: A Hands-on Approach

C Language
C Programming - Wikibook. The examples in this book are helpful. The most relevant chapters for CS344 are: Arrays & Strings, Pointers and relationship to arrays, Memory Management. Lesson 1. 

C Progrmming - Tutorial.

CUDA programming model. Most of this is covered in lecture, but there are additional examples here that illustrate the model. Lessons 1 and 2.

cudaMalloc(). A good explanation of why cudaMalloc (cuda function for memory allocation) uses a pointer to a pointer. This is not explained in lecture. Make sure you have reviewed pointers before reading. Lesson 1. 

Shared memory in CUDA. The matrix multiplication examples here illustrates clearly the usage of shared memory. Note figures 9 and 10. Lessons 4 and 5.

Parallel Algorithms
Parallel scan. The entire chapter is very useful. It further clarifies some of the material in Lessons 3 and 4 and goes into more detail that is useful when doing the assignments. Note Section 39.2.4 Arrays of arbitrary size.

Sorting algorithms. It also gives an overview of implementing algorithms on the GPU. Most importantly, it provides an outline of an efficient implementation of Radix sort. Lesson 4 and Problem Set 4. 

N-body simulation. Physics many-body simulation! Provides some of the background for the N-body simulation example in Lesson 6.1.