Hi Guest, 30 September 2020 Wednesday IST

About CUSAT | About Department | Alumni | Sitemap | Disclaimer  

     
 
  Home > Academic/Programmes > Programme Structure > CSE (2019)
       
       
 
19-475-0201 FOUNDATIONS OF DATA SCIENCE
Core/Elective: Core Semester: 2 Credits: 4
Course Description

While traditional areas of computer science remain highly important, increasingly researchers of the future will be involved with using computers to understand and extract usable information from massive data arising in applications, not just how to make computers useful on specific well-defined problem. This course introduce the statistics and computer science concepts required to master data science as a subject.

Course Objectives

To introduce the mathematical foundations to deal with high dimensional data
To introduce concepts like random graphs, random walks, markov chains
To understand basic underpinnings of machine learning algorithms

Course Content

Module I
High dimensional space: Law of large numbers, geometry of high dimensions, properties of the unit ball, Gaussians in high dimension, random projection and Johnson-Lindenstrauss Lemma, seperating Gaussians – Singular Value Decomposition: Power method to compute SVD, singular vectors and Eigen vectors, Applications of SVD

Module II
Random Graphs: G(n,p) model, phase transitions, giant component, branching process, cycles and full connectivity – Growth models of Random Graphs: Growth models with and without preferential attachment, small world graphs

Module III
Random walks and Markov chains: Stationary distribution, MCMC, Gibbs sampling, areas and volumes, convergence of random walks, random walks in Euclidean space, web as a Markov chain

Module IV
Learning and VC dimention: Linear Separators, the Perceptron Algorithm, and Margins, Nonlinear Separators, Support Vector Machines, and Kernels, Strong and Weak Learning – Boosting – VapnikChervonenkis dimention: Examples of Set Systems, The Shatter Function, The VC Theorem, Simple Learning

Module V
Algorithms for Massive Data Problems: Locality-Sensitive Hashing - shingling of documents, min-hashing. Distance measures, nearest neighbors, frequent itemsets- LSH families for distance measures, Applications of LSH- Challenges when sampling from massive data Frequency Moments of Data Streams, Counting Frequent Elements, Matrix Algorithms Using Sampling, Sketch of a Large Matrix, Sketches of Documents

REFERNCES

1. Avrim Blum, John Hopcroft, Ravindran Kannan; Foundations of Data Science, 2018
https://www.cs.cornell.edu/jeh/book.pdf
2. Jure Leskovec, Rajaraman, A., & Ullman, J. D., Mining of Massive Datasets, Cambridge University Press, 2e, 2016
3. Charu C. Aggarwal, Data Streams: Models and Algorithms, 1e, Springer, 2007
4. Michael I Jordan et.al , Frontiers in Massive Data analysis, 1e, National Academies Press, 2013
5. Nathan Marz & James Warren, Big Data: Principles and best practices of scalable realtime data systems, Manning Publications, 2015


Copyright © 2009-20 Department of Computer Science,CUSAT
Design,Hosted and Maintained by Department of Computer Science
Cochin University of Science & Technology
Cochin-682022, Kerala, India
E-mail: csdir@cusat.ac.in
Phone: +91-484-2577126
Fax: +91-484-2576368