    Hi Guest, 17 January 2021 Sunday IST             Home > Academic/Programmes > Programme Structure > CSE (2019)       19-475-0101 PROBABILITY AND STATISTICS FOR DATA SCIENCE Core/Elective: Core Semester: 1 Credits: 4 Course Description This course introduces fundamental concepts in probability and statistics from a data-science perspective. The aim is to become familiarized with probabilistic models and statistical methods that are widely used in data analysis. Course Objectives To introduce the concepts of probability and statistics to data scientists To get a clear understanding of statistical inference procedures in estimation and testing To understand the connect between statistical theory and statistical practice Course Content Module I Probability theory: probability spaces, conditional probability, independence – Random variables: discrete and continuous random variables, functions of random variables, generating random variables – Multivariate random variables: joint distributions, independence, generating multivariate random variables, rejection sampling – Expectation: Mean, variance and covariance, conditional expectation Module II Random process: definition, mean and autocovariance functions, iid sequences, Gaussian and Poisson process , random walk – Convergence of random process: types of convergence, law of large numbers, Central limit theorem, monte carlo simulation – Markov chains: recurrence, periodicity, convergence, markov-chain monte carlo- Gibbs sampling, EM algorithm, variational inference Module III Descriptive statistics: histogram, sample mean and variance, order statistics, sample covariance, sample covariance matrix – Frequentist statistics: sampling, mean square error, consistency, confidence intervals, parametric and non-parametric model estimation Module IV Bayesian statistics: Bayesian parametric models, conjugate prior, bayesian estimators – Hypothesis testing: testing framework, parametric testing, permutation test, multiple testing – Mixture models: Gaussian mixture models, multinomial mixture models Module V Linear regression: linear models, least-squares estimation, interval estimation in simple linear regression, overfitting – Multiple linear regression models: Estimation of model parameters, MLE – Non linear regression: Non linear least squares, transformation to linear model – Generalized linear models: logistic regression models, Poisson regression REFERNCES 1. Michael Mitzenmacher and Eli Upfal; Probability and Computing, 2ed, Cambridge University Press, 2017 2. Alan Agresti, Christine A. Franklin and Bernhard Klingenberg; Statistics: The Art and Science of Learning from Data, 4ed, Pearson, 2017 3. Sheldon M Ross; A First Course in Probability, 10ed, Pearson, 2018 4. Robert V Hogg, Joseph W McKean and Allen T Cralg; Introduction to Mathematical Statistics, 8ed, Pearson, 2018 5. Douglas C Montgomery, Elizabeth A Peck and G Geoffrey Vining; Introduction to Linear Regression Analysis, 5ed, Wiley-Blackwell, 2012 Online Resources: Course notes of Carlos Fernandez-Granda, DS-GA 1002: Probability and Statistics for Data Science https://cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17/index.html     Copyright © 2009-21 Department of Computer Science,CUSAT Design,Hosted and Maintained by Department of Computer Science Cochin University of Science & Technology Cochin-682022, Kerala, India E-mail: csdir@cusat.ac.in Phone: +91-484-2577126 Fax: +91-484-2576368                     