19-475-0504 REINFORCEMENT LEARNING
|
Core/Elective:
Elective Semester: 5 Credits:
4 |
Course Description |
This
course aims to introduce the concepts reinforcement
learning and to impart an understanding of how
reinforcement learning -- along with supervised and
unsupervised learning -- form a building block of
modern artificial intelligence |
Course Objectives |
To have
a solid understanding of reinforcement learning concepts
and where they fit in the machine
learning landscape.
To develop the ability to take a machine learning problem
and figure out when it is appropriate to
model the problem as a reinforcement learning problem,
and how to do that |
Course Content |
Module I
Introduction to reinforcement learning: Examples of reinforcement learning, Elements of reinforcement
learning - Tabular and Approximate solution methods: Multi-armed bandits, Action-value methods,
Incremental Implementation, Upper-Confidence-Bound Action selection, Gradient Bandit Algorithms -Associative Search
Module II
Finite Markov Decision Processes -- The Agent-Environment Interface -- Goals and Rewards -- Returns and
Episodes -- Policies and Value Functions -- Optimality of policies and value functions -- Optimality and
approximation – Dynamic Programming -- Policy Evaluation -- Policy Improvement -- Policy Iteration --Value Iteration -- Asynchronous Dynamic Programming -- Generalized Policy Iteration
Module III
Monte Carlo Methods -- Monte Carlo prediction -- Estimation of Action Values -- Monte Carlo Control --Control without Exploring Starts -- Off-policy prediction via Importance Sampling -- Incremental
Implementation -- Off-policy Monte Carlo Control -- Temporal-Difference Learning -- TD prediction --Advantages of TD methods -- Optimality of TD(0) -- Sarsa and Q-learning -- Expected Sarsa --Maximization Bias, Double Learning -- Special Cases
Module IV
n-step Bootstrapping -- n-step TD prediction -- n-step Sarsa -- n-step Off-policy Learning -- Off-policy Learning
Without Importance Sampling -- Planning and Learning with Tabular Methods -- Models and Planning -- Dyna --Prioritized Sweeping -- Expected vs.\ Sample Updates -- Trajectory Sampling -- Real-time Dynamic Programming
-- Heuristic Search -- Rollout algorithms -- Monte Carlo Tree Search
Module V
Approximate Solution Methods -- On-policy Prediction with Approximation -- Value-function Approximation --Prediction Objective -- Stochastic-gradient and Semi-gradient Methods -- Linear Methods -- Feature Construction
for Linear Methods -- Manual Selection of Step-Size Parameters -- Nonlinear Function Approximation using ANN
-- Least-Squares TD -- Memory-based Function Approximation -- Kernel-based Function Approximation -- Onpolicy Control with Approximation -- Episodic Semi-gradient Control -- Semi-gradient n-step Sarsa -- Average
Reward -- Deprecating the Discounted Setting -- Differential Semi-gradient n-step Sarsa
|
REFERNCES |
1. Richard
S.Sutton and Andrew G. Barto; Reinforcement Learning:
An Introduction, 2 ed, MIT Press, 2018.
2. Marco Wiering and Martijn van Otterlo (Editors);
Reinforcement Learning: State-of-the-Art, Springer,
2012
3. Csaba Szepesvari; Algorithms for Reinforcement Learning,
Morgan and Claypool Publishers, 2010 |