Skip to content

Notes on losses and overfitting

Michael Pang edited this page Dec 29, 2017 · 2 revisions

Policy

  1. A network that predicts uniform policy every time will incur a cross-entropy loss of ln(C) where C is the number of categories. Here C=1968 so an "upper bound" of 7.585 loss.
  2. Conversely, if your loss is L, your model generally picks from the top e^L moves.
Loss Top x moves
0 1
0.693 2
1.099 3
1.386 4
1.609 5
3.555 35

Value

  1. Self play keeps resigning
  2. If you assume most "normal" chess games are pretty even until halfway through, you get a lower bound of 0.5 on asymptotic MSE. (I think) this is decreasing in the average elo of the players and also the elo difference between the players. If you assume half of GM games are draws too, the lower bound goes down to 0.25.
  3. AZ had 5000 TPUs running self play and only 64 running SGD.