All contents are based on the Book "Reinforcement Learning an introduction second edition" written by Richard S. Sutton and Andrew G. Barto.
As the author said
we retain a focus on core, online learning algorithms.
The structure of the book is showed below
- Chapter2-8 treats as much of reinforcement learning as possible without going beyond the tabular case for which exact solutions can be found.
- Chapter9-13 is devoted to extending the ideas to function approximation.
- The third part of the book has large new chapters on reinforcement learning’s relationships to psychology (Chapter 14) and neuroscience (Chapter 15), as well as an updated case-studies chapter including Atari game playing, Watson’s wagering strategy, and the Go playing programs AlphaGo and AlphaGo Zero (Chapter 16).
I think the story of Harry, Sutton and Barto is quiet motivating.
Reinforcement learning is much more focused on gold-directed learning from interaction than other approaches to machine learning.
trial-and-error search and delayed reward are the two most important distinguishing features of reinforcement learning. Reinforcement learning is the third paradigm of machine learning compared with supervised and unsupervised learning. One of the challenges that arise in reinforcement learning, and not in other kinds of learning, is the trade-o↵ between exploration and exploitation. Another key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. Reinforcement learning takes the opposite tack, starting with a complete, interactive, goal-seeking agent. The author writes:
One must look beyond the most obvious examples of agents and their environments to appreciate the generality of the reinforcement learning framework.
- policy. A policy defines the learning agent’s way of behaving at a given time.
- reward. A reward signal defines the goal of a reinforcement learning problem. reward v.s. value. Whereas the reward signal indicates what is good in an immediate sense, a value function specifies what is good in the long run
- value function.
- model(optionally) This is something that mimics the behavior of the environment, or more generally, that allows inferences to be made about how the environment will behave. model-based methods v.s. model-free methods. Modern reinforcement learning spans the spectrum from low-level, trial-and-error learning to high-level, deliberative planning.
In this book, state representation is not taken into consideration.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 1.0 Generic License.