This repository contains three high-quality reinforcement learning course projects.
Lunar Lander: my deep Q-learning model achieves 280+ points on average for the Lunar Lander Problem, the highest score among those we can find online and reported in the class discussion board. My paper-like report is here.
Correlated-Q: replicates the results in Correlated-Q Learning. In addition, we demo the equilibrium evolution. For how to derive the linear programming dual, please read our paper-like report here.
SuttonMDP: replicates the results in Learning to Predict by the Methods of Temporal Differences. The same results are not easy to replicate as the paper is vague on the model's parameters. The right parmeter setup is found by repeatedly comparing the charts with the theory.