tic-tac-toe-agent

This is a quick implementation of a reinforcement learning agent only using numpy. The agent is trained with Monte Carlo control and epsilon-soft policies. Something interesting I found out is that increasing epsilon from 0.05 to 0.15 significantly decreases the number of games the agent needs to play to learn to play optimally. With 0.15 it knows how to play optimally after 10,000 games. With 0.05, even 100,000 games are not enough.

The agent plays during training against a random-trainer, which just picks a random move.

Using a deterministic policy with exploring starts would probably be better and more efficient. I chose to do it with Monte carlo soft-policies for fun.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tic-tac-toe-agent

About

Releases

Packages

Contributors 2

Languages

lam206/tic-tac-toe-agent

Folders and files

Latest commit

History

Repository files navigation

tic-tac-toe-agent

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages