1.Introduction - parallel computing - more speed or parallelism - languages and models - sequential vs parallel - concurrent, parallel, distributed - parallel hardware architecture - modifications to the von Neumann Model
2. Evolution of GPU - GPGPU - introduction to data
parallelism - CUDA program structure - vector addition
kernel - device global memory and data transfer
3. Cuda thread organization - mapping threads to multi-dimensional data - assigning resources to blocks - synchronization and transparent scalability - thread scheduling and latency tolerance
4. Memory access efficiency - CUDA device memory types - performance considerations - global memory bandwidth - instruction mix and thread granularity -floating point considerations
5. Parallel programming patterns - convolution - prefix
sum - sparse matrix and vector multiplication - application
case studies - strategies for solving problems using
B. Kirk, Wen-mei W Hwu Programming Massively Parallel
Processors, 2nd Edition, Morgan Kaufmann, 2012.
2. Peter Pacheco, Introduction to Parallel Programming,
Morgan Kaufmann, 2011.
3. Shane Cook, CUDA Programming: A Developer's Guide
to Parallel Computing with GPUs, MorganKaufmann, 2012.
4. Jason Sanders, Edward Kandrot, CUDA by Example: An
Introduction to General-Purpose GPU Programming, Addison-Westley