MiniZero currently supports two evaluation methods to evaluate program strength: self-evaluation, and fight-evaluation.
Self-evaluation evaluates the relative strengths between different iterations in a training session, i.e., it evaluates whether a network model is continuously improving during traing.
tools/quick-run.sh self-eval GAME_TYPE FOLDER [CONF_FILE] [INTERVAL] [GAMENUM] [OPTION]...
GAME_TYPE
sets the target game, e.g.,tictactoe
.FOLDER
sets the folder to be evaluated, which should contain themodel/
subfolder.CONF_FILE
sets the config file for evaluation.INTERVAL
sets the iteration interval between each model pair to be evaluated, e.g.10
indicates to pair the 0th and the 10th models, then the 10th and 20th models, and so on.GAME_NUM
sets the number of games to play for each model pair, e.g.,100
.OPTION
sets optional arguments, e.g.,-conf_str
sets additional configurations.
For detailed arguments, run tools/quick-run.sh self-eval -h
.
Sample commands:
# evaluate a TicTacToe training session using "tictactoe_play.cfg", run 100 games for each model pair: 0th vs 10th, 10th vs 20th, ...
tools/quick-run.sh self-eval tictactoe tictactoe_az_1bx256_n50-cb69d4 tictactoe_play.cfg 10 100
# evaluate a TicTacToe training session using its training config, overwrite several settings for evaluation
tools/quick-run.sh self-eval tictactoe tictactoe_az_1bx256_n50-cb69d4 tictactoe_az_1bx256_n50-cb69d4/*.cfg 10 100 -conf_str actor_select_action_by_count=true:actor_use_dirichlet_noise=false:actor_num_simulation=200
# use more threads for faster evaluation
tools/quick-run.sh self-eval tictactoe tictactoe_az_1bx256_n50-cb69d4 tictactoe_play.cfg 10 100 --num_threads 20
Note that evaluation is unnecessary for Atari games.
The evaluation results are stored inside FOLDER
, in a subfolder named self_eval
by default, which contains the following records:
elo.csv
saves the evaluated model strength in Elo rating.elo.png
plots the Elo rating ofelo.csv
.5000_vs_0
,10000_vs_5000
, and other folders keep game trajectory records for each evaluated model pair.
Fight-evaluation evaluates the relative strengths between the same iterations of two training sessions, i.e., it compares the learning results of two network models.
tools/quick-run.sh fight-eval GAME_TYPE FOLDER1 FOLDER2 [CONF_FILE1] [CONF_FILE2] [INTERVAL] [GAMENUM] [OPTION]...
GAME_TYPE
sets the target game, e.g.,tictactoe
.FOLDER1
andFOLDER2
set the two folders to be evaluated.CONF_FILE1
andCONF_FILE2
set the config files for both folders; ifCONF_FILE2
is unspecified,FOLDER2
will usesCONF_FILE1
for evaluation.INTERVAL
sets the iteration interval between each model pair to be evaluated, e.g.10
indicates to match the ith models of both folders, then the i+10th models, and so on.GAME_NUM
sets the number of games to play for each model pair, e.g.,100
.OPTION
sets optional arguments, e.g.,-conf_str
sets additional configurations.
For detailed arguments, run tools/quick-run.sh fight-eval -h
.
Sample commands:
# evaluate two training results using "tictactoe_play.cfg" for both programs, run 100 games for each model pair
tools/quick-run.sh fight-eval tictactoe tictactoe_az_1bx256_n50-cb69d4 tictactoe_az_1bx256_n50-731a0f tictactoe_play.cfg 10 100
# evaluate two training results using "tictactoe_cb69d4.cfg" and "tictactoe_731a0f.cfg" for the former and the latter, respectively
tools/quick-run.sh fight-eval tictactoe tictactoe_az_1bx256_n50-cb69d4 tictactoe_az_1bx256_n50-731a0f tictactoe_cb69d4.cfg tictactoe_731a0f.cfg 10 100
The evaluation results are stored inside FOLDER1
, in a subfolder named [FOLDER1]_vs_[FOLDER2]_eval
by default, which contains the following records:
elo.csv
saves the evaluation statistics and strength comparisons of all evaluated model pairs.elo.png
plots the Elo rating comparisons reported inelo.csv
.0
,5000
, and other folders keep game trajectory records for each evaluated model pair.
Note Before the fight-evaluation, it is suggested that a self-evaluation for
FOLDER1
be run first to generate a baseline strength, which is necessary for the strength comparison.
Evaluation requires a different configuration from training, e.g., use more simulations and disable noise to always select the best action.
actor_num_simulation=400
actor_select_action_by_count=true
actor_select_action_by_softmax_count=false
actor_use_dirichlet_noise=false
actor_use_gumbel_noise=false
In addition, sometimes played games become too similar. To prevent this, use random rotation (for AlphaZero only) or even add softmax/noise back.
actor_use_random_rotation_features=true
actor_select_action_by_count=false
actor_select_action_by_softmax_count=true