Reinforcement Learning

A Reinforcement Learning bot Playing Pong game based on Atari Video Game

About Pong

Pong is one of the earliest arcade video games, featuring basic two-dimensional graphics in a table tennis sports format. The game was first produced by Atari and released in 1972.

Implemenation

Implementation leverages TF-Agents, focusing on selecting an efficient optimizer and the appropriate replay buffer size.

The pong.ipynb file implements a deep reinforcement learning algorithm using Deep Q-Network (DQN) combined with an experience replay algorithm (Mnih et al., 2015).

This implementation works directly with raw pixel observations to learn policies for playing the Atari game Pong.

Application

This demonstration showed how a deep learning model could learn to play 1970s Atari 2600 games from scratch.

The model not only learned to play the games but also matched or exceeded the performance of the best human experts.

The Jupyter notebook implementation uses the TF-Agents RL library for Python, which can simulate various environments, including Atari games.

This library is based on TensorFlow and developed by Google.

The settings in the notebook are similar to those in (Mnih et al., 2015), but with improved choices for the optimizer, loss function, and replay buffer settings.

Adding Replay Buffer

I have introduced a new hyper-parameter here (REPLAY MEMORY BUFFER SIZE) which has to be carefully tuned.
Since the replay buffer functions as a fixed-size circular storage for transitions, its size directly impacts the age of the oldest policy in the buffer.
Given that the policy is updated every four steps taken in the environment, the age of the oldest policy in the buffer corresponds to the number of environment steps divided by four gradient updates.
In this setting, buffer size of 50000 seems to work the best.

Function - Adam Optimiser, MSE Loss abd Huber Loss

Functions a. Adam Optimiser b. MSE Function c. Huber Loss Function by RMSProper optimiser
For DQN and Pong environment, Adam optimiser with MSE loss function seems to work much better. It converges nicely already after 400,000 iterations to approximately 19 average return while showing reasonably stable behaviour
The model with RMSProp and Huber loss, on the other hand, takes a long time to learn, and even after 1.6 million iterations, it had only managed to reach a mean return of only approximately 10.

References:

Deep Reinforcement Learning: Pong from Pixels
TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)
Geron, A. (2019). Hands-on machine learning with scikit-learn, keras, and tensorflow
Luostari R. (2021). Playing Atari Pong with Deep Q-Network: Implementation using TF-Agents, se-lecting efficient optimiser, and right replay buffer size.

Packages

The package.txt is added. The code is running without issues on Ubuntu 20.4 with Python 3.7.9.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
packages.txt		packages.txt
pong_game_rl.ipynb		pong_game_rl.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning

About Pong

Implemenation

Application

Adding Replay Buffer

Function - Adam Optimiser, MSE Loss abd Huber Loss

References:

Packages

About

Releases

Packages

Languages

vasanthkumar18/ReinforcementLearning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning

About Pong

Implemenation

Application

Adding Replay Buffer

Function - Adam Optimiser, MSE Loss abd Huber Loss

References:

Packages

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages