Skip to content

Training data generation

Fabian Fichter edited this page Aug 28, 2024 · 25 revisions

Overview

There is training data generation code based on different engines

Except for Shogi the Fairy-Stockfish based training data generator should be used.

Training data generation using Fairy-Stockfish

Training data generator: https://github.com/fairy-stockfish/variant-nnue-tools/releases

Example

please type these command in training_data_generator line by line:

uci
setoption name Use NNUE value false
setoption name Threads value 8
setoption name Hash value 2048
setoption name UCI_Variant value your_variant
isready
generate_training_data depth 2 count 10000000 random_multi_pv 4 random_multi_pv_diff 100 random_move_count 8 random_move_max_ply 20 write_min_ply 5 eval_limit 10000 set_recommended_uci_options data_format bin output_file_name your_variant.bin
quit

Please change your_variant to the variant you want to train. e.g. xiangqi.

If the variant isn't built-in by default, you'll have to load variants.ini first:

setoption name VariantPath value location_of_variants.ini

If you want to use an existing NNUE network for training data generation, you need to change Use NNUE to pure and set the EvalFile, e.g., something like

setoption name Use NNUE value pure
setoption name EvalFile value somevariant-1234567890ab.nnue

In case there already is a current best NNUE network for a given variant in the list at https://github.com/fairy-stockfish/fairy-stockfish.github.io/blob/main/nnue.markdown#current-best-nnue-networks it is recommended to download that one and use it in training data generation.

Settings

  • Since only bin format is supported, you need to specify data_format bin.
  • The count and depth of the training data are the main factors influencing the strength of the resulting NNUE net. Usually at least 100M positions should be used to get decent results. A higher depth generally should be better, but also takes much longer to generate. Depths 4-5 usually already give quite good results.
  • For variants with a low branching factor like losers/antichess, it is recommended to increase the random_multi_pv_diff in order to increase the variety of positions.
  • You can lower/increase the eval_diff_limit (default: 500) to be more/less restrictive in the definition of quiet positions, since this defines the filter threshold for the (absolute) difference between qsearch and static evaluation.
  • You can specify an opening book by adding the book argument like e.g. book startingpositions.epd. This file should contain one FEN/EPD per line, e.g., for Janggi:
rnba1abnr/4k4/1c5c1/p1p1p1p1p/9/9/P1P1P1P1P/1C5C1/4K4/RNBA1ABNR w - - 0 1
rbna1abnr/4k4/1c5c1/p1p1p1p1p/9/9/P1P1P1P1P/1C5C1/4K4/RNBA1ABNR w - - 0 1
...
  • If you want to train variants with particularly many pieces on the board (like >50) or in the pockets (>=32), you should compile the training data generator with largedata=yes. Also see the data format for technical details on the limits.
  • if you want to quit during the generating process, simply close the executable, and the data will be stored in the .bin file. If you want to continue a generation, run the same command as previous time. The new data will be added to the old file.

Generating data from old HalfKP networks (deprecated)

If you want to use an old HalfKP NNUE network to start generating training data, you can use the old generator code at https://github.com/fairy-stockfish/variant-nnue. However, since the training data format was changed in the meantime, this will only work with older versions of the trainer, the latest compatible version should be https://github.com/fairy-stockfish/variant-nnue-pytorch/tree/91c302941acb131fbabb441dd6ced992ec04dfcb. Also the syntax for the training data generation command looks slightly different. An example is:

gensfen depth 2 loop 100000000 random_multi_pv 4 random_multi_pv_diff 100 random_move_count 8 random_move_maxply 20 write_minply 5 write_maxply 200 eval_limit 10000 set_recommended_uci_options sfen_format bin output_file_name extinction.bin

Training data generation using YaneuraOu (for Shogi)

In order to generate data compatible to this trainer, you need to use the customized YaneuraOu training data generator from https://github.com/ianfab/YaneuraOu/tree/fairy_bin. Make sure to compile the data generator with make gensfen, which requires OpenBLAS, and download an nn.bin. Its syntax is slightly different from the Fairy-Stockfish data generator, see the example below.

Example

usi
setoption name Threads value 8
setoption name USI_Hash value 2048
isready
gensfen loop 20000000 depth 1 write_minply 6 random_multi_pv_diff 200 random_multi_pv 4 random_move_count 8 eval_limit 10000 output_file_name shogi.bin
quit