|
 |
 |
 |
 |
19-475-0502 PARALLEL COMPUTING FOR DATA SCIENCE
|
Core/Elective:
Elective Semester: 5 Credits:
4 |
Course Description |
This course is to discuss exclusively on parallel data structures, algorithms, software tools, and
applications in data science. It includes examples not only from the classic "n observations, p
variables" matrix format but also from time series, network graph models, and numerous other
structures common in data science. With the main focus on GPU based computation, the examples
illustrate the range of issues encountered in parallel programming. |
Course Objectives |
To gain
a working knowledge of parallel programming with data
sets
To develop programming skills required for parallel
computing
To know advanced datastructures required for efficient
data processing |
Course Content |
Module I
Parallel computing: languages and models for parallelism
- Sequential vs parallel: concurrent, parallel, distributed
- parallel hardware architecture - modifications to
the von Neumann Model -Evolution of GPU - GPGPU - introduction
to data parallelism - CUDA program structure - vector
addition kernel - device global memory and data transfer
Module II
CUDA thread organization - mapping threads to multi-dimensional
data - assigning resources to blocks - synchronization
and transparent scalability - thread scheduling and
latency tolerance -Memory access efficiency - CUDA device
memory types - performance considerations - global memory
bandwidth - instruction mix and thread granularity -floating
point considerations
Module III
Parallel programming patterns: convolution - prefix
sum - sparse matrix and vector multiplication -application
case studies - strategies for solving problems using
parallel programming
Module IV
Parallel Patterns: merge sort, sequential and parallel
approaches, co-rank function implementation, basic parallel
merge kernel – Graph search: sequential BFS, parallel
BFS, optimizations
Module V
CUDA dynamic parallelism: example for dynamic parallelism,
memory data visibility, configurations and memory management,
synchronization, streams and events
|
REFERNCES |
1. David
B. Kirk, Wen-mei W Hwu; Programming Massively Parallel
Processors, 3 ed, Morgan Kaufmann, 2016
2. Peter Pacheco, An Introduction to Parallel Programming,
Morgan Kaufmann, 2011
3. Norman Matloff; Parallel Computing for Data Science,
1 ed, CRC Press, 2015 |
|
 |
 |
 |
 |
|
|
|
|
|
|