Notes on losses and overfitting

Jump to bottom Edit New page

Michael Pang edited this page Dec 29, 2017 · 2 revisions

Policy

A network that predicts uniform policy every time will incur a cross-entropy loss of ln(C) where C is the number of categories. Here C=1968 so an "upper bound" of 7.585 loss.
Conversely, if your loss is L, your model generally picks from the top e^L moves.

Loss	Top x moves
0	1
0.693	2
1.099	3
1.386	4
1.609	5
3.555	35

Value

Self play keeps resigning
If you assume most "normal" chess games are pretty even until halfway through, you get a lower bound of 0.5 on asymptotic MSE. (I think) this is decreasing in the average elo of the players and also the elo difference between the players. If you assume half of GM games are draws too, the lower bound goes down to 0.25.
AZ had 5000 TPUs running self play and only 64 running SGD.

Current branch: https://github.com/Akababa/chess-alpha-zero/tree/nohistory