Hi Guest, 30 September 2020 Wednesday IST

About CUSAT | About Department | Alumni | Sitemap | Disclaimer  

     
 
  Home > Academic/Programmes > Programme Structure > CSE (2019)
       
       
 
19-475-0101 PROBABILITY AND STATISTICS FOR DATA SCIENCE
Core/Elective: Core Semester: 1 Credits: 4
Course Description

This course introduces fundamental concepts in probability and statistics from a data-science perspective. The aim is to become familiarized with probabilistic models and statistical methods that are widely used in data analysis.

Course Objectives

To introduce the concepts of probability and statistics to data scientists
To get a clear understanding of statistical inference procedures in estimation and testing
To understand the connect between statistical theory and statistical practice

Course Content

Module I
Probability theory: probability spaces, conditional probability, independence – Random variables: discrete and continuous random variables, functions of random variables, generating random variables – Multivariate random variables: joint distributions, independence, generating multivariate random variables, rejection sampling – Expectation: Mean, variance and covariance, conditional expectation

Module II
Random process: definition, mean and autocovariance functions, iid sequences, Gaussian and Poisson process , random walk – Convergence of random process: types of convergence, law of large numbers, Central limit theorem, monte carlo simulation – Markov chains: recurrence, periodicity, convergence, markov-chain monte carlo- Gibbs sampling, EM algorithm, variational inference

Module III
Descriptive statistics: histogram, sample mean and variance, order statistics, sample covariance, sample covariance matrix – Frequentist statistics: sampling, mean square error, consistency, confidence intervals, parametric and non-parametric model estimation

Module IV
Bayesian statistics: Bayesian parametric models, conjugate prior, bayesian estimators – Hypothesis testing: testing framework, parametric testing, permutation test, multiple testing – Mixture models: Gaussian mixture models, multinomial mixture models

Module V
Linear regression: linear models, least-squares estimation, interval estimation in simple linear regression, overfitting – Multiple linear regression models: Estimation of model parameters, MLE – Non linear regression: Non linear least squares, transformation to linear model – Generalized linear models: logistic regression models, Poisson regression

REFERNCES

1. Michael Mitzenmacher and Eli Upfal; Probability and Computing, 2ed, Cambridge University
Press, 2017
2. Alan Agresti, Christine A. Franklin and Bernhard Klingenberg; Statistics: The Art and Science of
Learning from Data, 4ed, Pearson, 2017
3. Sheldon M Ross; A First Course in Probability, 10ed, Pearson, 2018
4. Robert V Hogg, Joseph W McKean and Allen T Cralg; Introduction to Mathematical Statistics,
8ed, Pearson, 2018
5. Douglas C Montgomery, Elizabeth A Peck and G Geoffrey Vining; Introduction to Linear
Regression Analysis, 5ed, Wiley-Blackwell, 2012

Online Resources:
Course notes of Carlos Fernandez-Granda, DS-GA 1002: Probability and Statistics for Data Science
https://cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17/index.html


Copyright © 2009-20 Department of Computer Science,CUSAT
Design,Hosted and Maintained by Department of Computer Science
Cochin University of Science & Technology
Cochin-682022, Kerala, India
E-mail: csdir@cusat.ac.in
Phone: +91-484-2577126
Fax: +91-484-2576368