1. Introduction – parallel computing – more speed or parallelism – languages and models – sequential vs parallel
– concurrent, parallel, distributed – parallel hardware architecture – modifications to the von Neumann model
2. Evolution of GPU – GPGPU – introduction to data
parallelism – CUDA program structure – vector addition
kernel – device global memory and data transfer
3. CUDA thread organization – mapping threads to multi-dimensional
data – assigning resources to blocks – synchronization
and transparent scalability – thread scheduling and
4. Memory access efficiency – CUDA device memory types – performance considerations – global memory bandwidth – instruction mix and thread granularity – floating point considerations
5. Parallel programming patterns – convolution – prefix sum – sparse matrix and vector multiplication –
application case studies – strategies for solving problems using parallel programming
 David B. Kirk, Wen-mei
W Hwu Programming Massively Parallel Processors, 2nd
Edition, Morgan Kaufmann, 2012.
 Peter Pacheco, Introduction to Parallel Programming,
Morgan Kaufmann, 2011.
 Shane Cook, CUDA Programming: A Developer’s Guide
to Parallel Computing with GPUs, Morgan Kaufmann, 2012.
 Jason Sanders, Edward Kandrot, CUDA by Example:
An Introduction to General-Purpose GPU Pro- gramming,
Addison-Westley Professional, 2010.