Skip to content

lam206/tic-tac-toe-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

tic-tac-toe-agent

This is a quick implementation of a reinforcement learning agent only using numpy. The agent is trained with Monte Carlo control and epsilon-soft policies. Something interesting I found out is that increasing epsilon from 0.05 to 0.15 significantly decreases the number of games the agent needs to play to learn to play optimally. With 0.15 it knows how to play optimally after 10,000 games. With 0.05, even 100,000 games are not enough.

The agent plays during training against a random-trainer, which just picks a random move.

Using a deterministic policy with exploring starts would probably be better and more efficient. I chose to do it with Monte carlo soft-policies for fun.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published