Intro to Parallel Programming

Thank you for signing up for the course! We look forward to working with you and hearing your feedback in our forums.

Need help getting started?


Course Resources

Reading Materials

  1. CUDA by Example: An Introduction to General-Purpose GPU Programming
  2. Professional CUDA C Programming
  3. Programming Massively Parallel Processors: A Hands-on Approach

Coding Materials

  1. C Programming - Wikibook
    The examples in this book are helpful. The most relevant chapters for CS344 are: Arrays & Strings, Pointers and relationship to arrays, Memory Management. Lesson 1. 
  2. C Progrmming - Tutorial
  3. CUDA programming model Most of this is covered in lecture, but there are additional examples here that illustrate the model. Lessons 1 and 2.
  4. cudaMalloc() A good explanation of why cudaMalloc (cuda function for memory allocation) uses a pointer to a pointer. This is not explained in lecture. Make sure you have reviewed pointers before reading. Lesson 1.
  5. Shared memory The matrix multiplication examples here illustrates clearly the usage of shared memory. Note figures 9 and 10. Lessons 4 and 5.
  6. Parallel scan The entire chapter is very useful. It further clarifies some of the material in Lessons 3 and 4 and goes into more detail that is useful when doing the assignments. Note Section 39.2.4 Arrays of arbitrary size.
  7. Sorting algorithms It also gives an overview of implementing algorithms on the GPU. Most importantly, it provides an outline of an efficient implementation of Radix sort. Lesson 4 and Problem Set 4. 
  8. N-body simulation Physics many-body simulation! Provides some of the background for the N-body simulation example in Lesson 6.1.

Technical Support: CUDA Setup

  1. Amazon Elastic Compute Cloud (Amazon EC2) device query
  2. CUDA programming on a local computer
  3. Setup of Development Environment
  4. Troubleshooting -Incompatibility with GCC 4.7

Course Syllabus

The course will be released on 1 lesson per week basis. There will be 7 weeks worth of lectures and 6 problem sets, and problem sets are designed for image processing applications. 7 Lessons, 6 Problem Sets, 1 Final Exam, 3 Interviews
Detailed Syllabus, Lesson Slides, Chinese Subtitles

  1. Lesson 1: Introduction and the GPU Programming Model
    Project 1: Converting Photos from Color to Greyscale (for that classy touch!)
    Engligh Subtitle: Bill_Dally_Interview - 36.2KB
    Engligh Subtitle: The_GPU_Programming_Model - 93.6KB

  2. Lesson 2: GPU Hardware and Parallel Communication Patterns
    Project 2: Gaussian filter for smooth blur (miracle product for removing wrinkles!)
    English Subtitle: GPU Hardware and Parallel Communication Patterns

  3. Lesson 3: Fundamental Parallel Algorithms 1 (Reduce, Scan, Histogram)
    Project 3: HDR Tonemapping (because your TV doesn't really have a 10,000:1 contrast ratio)
    English Subtitle: Fundamental GPU Algorithms (Reduce, Scan, Histogram)

  4. Lesson 4: Fundamental Parallel Algorithms 2 (Applications of Sort and Scan)
    Project 4: Red Eye Removal using Template Matching (soothing relief for those bright red eyes)
    English Subtitle: Fundamental GPU Algorithms (Applications of Sort and Scan)
    English Subtitle:Ian Buck Interview

  5. Lesson 5: Optimizing GPU Programs
    Project 5: Accelerating Histograms (when fast isn't fast enough)
    English Subtitle: Optimizing GPU Programs

  6. Lesson 6: Parallel Computing Patterns
    Project 6: Seamless Image Compositing using Poisson Blending (or, who put the polar bear in the swimming pool?)
    English Subtitle: Parallel Computing Patterns Part A
    English Subtitle: Parallel Computing Patterns Part B

  7. Lesson 7: The Frontiers and Future of GPU Computing
    English Subtitle: Additional Parallel Computing Topics
    English Subtitle: Dynamic Parallelism
    English Subtitle: Stephen Jones Interview