NNUE training

Code: https://github.com/fairy-stockfish/variant-nnue-pytorch

Code changes

The training data generator prints the required code changes for the training code when setting a given variant with setoption name UCI_Variant value yourvariantname. Just check out a new branch in the variant-nnue-pytorch repository with git and apply the changes for a given variant there. Usually you should simply rely on what the training data generator prints, so you likely won't need to manually change the values but just copy and paste the code fragments to the corresponding place in the code. These are the code fragments that need to be replaced:

variant.h: The PIECE_COUNT is the maximum number of pieces on the board. The KING_SQUARES needs to be changed to 9 for Xiangqi/Janggi and to 1 for variants without kings. Remember to always recompile the training data loader after updating this file.

#define FILES 8
#define RANKS 8
#define PIECE_TYPES 6
#define PIECE_COUNT 32
#define POCKETS false
#define KING_SQUARES FILES * RANKS
#define DATA_SIZE 512

variant.py: Similar updates are required here, and in addition to that the initial guesses for piece values need to be defined. This file defines the architecture of the input layer for the variant NNUE network that will be trained.

RANKS = 8
FILES = 8
SQUARES = RANKS * FILES
KING_SQUARES = RANKS * FILES
PIECE_TYPES = 6
PIECES = 2 * PIECE_TYPES
USE_POCKETS = False
POCKETS = 2 * FILES if USE_POCKETS else 0

PIECE_VALUES = {
    1 : 126,
    2 : 781,
    3 : 825,
    4 : 1276,
    5 : 2538,
}

Resume from existing NNUE net

If you (optionally) want to continue training from an existing network, you need to first serialize it:

python serialize.py --features='HalfKAv2' somevariantnet.nnue startingpointfortraining.pt

Then, when running the training, you need to specify the serialized network as input to resume from:

python train.py --resume-from-model startingpointfortraining.pt --threads 1 --num-workers 1 --gpus 1 --max_epochs 10 training_data.bin validation_data.bin

Training example

The training command works the same as for the official trainer, e.g.,

python train.py --threads 1 --num-workers 1 --gpus 1 --max_epochs 10 training_data.bin validation_data.bin

--max_epochs: number of epochs for training. One epoch is 20M positions, so choose the number of epochs according to the amount of training data. E.g., for 200M positions in the training_data.bin file --max_epochs should be 10 (or slightly above).
The validation_data.bin is optional. If you don't have it, simply replace it with traning_data.bin

Converting a training checkpoint to NNUE file

python serialize.py logs/default/version_0/checkpoints/last.ckpt yourvariant.nnue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NNUE training

Code changes

Resume from existing NNUE net

Training example

Converting a training checkpoint to NNUE file

Clone this wiki locally