Hi Guest, 17 January 2021 Sunday IST

About CUSAT | About Department | Alumni | Sitemap | Disclaimer  

  Home > Academic/Programmes > Programme Structure > CSE (2019)
Core/Elective: Elective Semester: 5 Credits: 4
Course Description

This course aims to introduce the concepts reinforcement learning and to impart an understanding of how
reinforcement learning -- along with supervised and unsupervised learning -- form a building block of
modern artificial intelligence

Course Objectives

To have a solid understanding of reinforcement learning concepts and where they fit in the machine
learning landscape.
To develop the ability to take a machine learning problem and figure out when it is appropriate to
model the problem as a reinforcement learning problem, and how to do that

Course Content

Module I
Introduction to reinforcement learning: Examples of reinforcement learning, Elements of reinforcement learning - Tabular and Approximate solution methods: Multi-armed bandits, Action-value methods, Incremental Implementation, Upper-Confidence-Bound Action selection, Gradient Bandit Algorithms -Associative Search

Module II
Finite Markov Decision Processes -- The Agent-Environment Interface -- Goals and Rewards -- Returns and Episodes -- Policies and Value Functions -- Optimality of policies and value functions -- Optimality and approximation – Dynamic Programming -- Policy Evaluation -- Policy Improvement -- Policy Iteration --Value Iteration -- Asynchronous Dynamic Programming -- Generalized Policy Iteration

Module III
Monte Carlo Methods -- Monte Carlo prediction -- Estimation of Action Values -- Monte Carlo Control --Control without Exploring Starts -- Off-policy prediction via Importance Sampling -- Incremental Implementation -- Off-policy Monte Carlo Control -- Temporal-Difference Learning -- TD prediction --Advantages of TD methods -- Optimality of TD(0) -- Sarsa and Q-learning -- Expected Sarsa --Maximization Bias, Double Learning -- Special Cases

Module IV
n-step Bootstrapping -- n-step TD prediction -- n-step Sarsa -- n-step Off-policy Learning -- Off-policy Learning Without Importance Sampling -- Planning and Learning with Tabular Methods -- Models and Planning -- Dyna --Prioritized Sweeping -- Expected vs.\ Sample Updates -- Trajectory Sampling -- Real-time Dynamic Programming -- Heuristic Search -- Rollout algorithms -- Monte Carlo Tree Search

Module V
Approximate Solution Methods -- On-policy Prediction with Approximation -- Value-function Approximation --Prediction Objective -- Stochastic-gradient and Semi-gradient Methods -- Linear Methods -- Feature Construction for Linear Methods -- Manual Selection of Step-Size Parameters -- Nonlinear Function Approximation using ANN -- Least-Squares TD -- Memory-based Function Approximation -- Kernel-based Function Approximation -- Onpolicy Control with Approximation -- Episodic Semi-gradient Control -- Semi-gradient n-step Sarsa -- Average Reward -- Deprecating the Discounted Setting -- Differential Semi-gradient n-step Sarsa


1. Richard S.Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction, 2 ed, MIT Press, 2018.
2. Marco Wiering and Martijn van Otterlo (Editors); Reinforcement Learning: State-of-the-Art, Springer, 2012
3. Csaba Szepesvari; Algorithms for Reinforcement Learning, Morgan and Claypool Publishers, 2010

Copyright © 2009-21 Department of Computer Science,CUSAT
Design,Hosted and Maintained by Department of Computer Science
Cochin University of Science & Technology
Cochin-682022, Kerala, India
E-mail: csdir@cusat.ac.in
Phone: +91-484-2577126
Fax: +91-484-2576368