-
-
Notifications
You must be signed in to change notification settings - Fork 19
NNUE training
- A CUDA capable GPU: https://developer.nvidia.com/cuda-gpus
- CUDA: https://developer.nvidia.com/cuda-downloads
- Training code: https://github.com/fairy-stockfish/variant-nnue-pytorch
- Python
- Cmake and C++ compiler
- Training data (.bin) from the previous step
The training data generator writes the required settings files for the training code when specifying the variant and the target directory in trainer_config yourvariant /your/target/directory
. Just check out a new branch in the variant-nnue-pytorch repository with git and copy the new files for a given variant there. Usually you should simply rely on what the training data generator writes, so you likely won't need to manually change the values, but it is good to double check if the values are as expected (e.g., board size). These are the files that need to be replaced:
-
variant.h: The
PIECE_COUNT
is the maximum number of pieces on the board. TheKING_SQUARES
needs to be changed to 9 for Xiangqi/Janggi and to 1 for variants without kings. Remember to always recompile the training data loader after updating this file.
#define FILES 8
#define RANKS 8
#define PIECE_TYPES 6
#define PIECE_COUNT 32
#define POCKETS false
#define KING_SQUARES FILES * RANKS
#define DATA_SIZE 512
- variant.py: Similar updates are required here, and in addition to that the initial guesses for piece values need to be defined. This file defines the architecture of the input layer for the variant NNUE network that will be trained.
RANKS = 8
FILES = 8
SQUARES = RANKS * FILES
KING_SQUARES = RANKS * FILES
PIECE_TYPES = 6
PIECES = 2 * PIECE_TYPES
USE_POCKETS = False
POCKETS = 2 * FILES if USE_POCKETS else 0
PIECE_VALUES = {
1 : 126,
2 : 781,
3 : 825,
4 : 1276,
5 : 2538,
}
If you (optionally) want to continue training from an existing network, you need to first serialize it:
python serialize.py --features='HalfKAv2' somevariantnet.nnue startingpointfortraining.pt
Then, when running the training, you need to specify the serialized network as input to resume from:
python train.py --resume-from-model startingpointfortraining.pt --threads 1 --num-workers 1 --gpus 1 --max_epochs 10 training_data.bin validation_data.bin
Depending on whether you want to continue from an existing NN or train from scratch, use the training command with or without --resume-from-model
.
python train.py --threads 1 --num-workers 1 --gpus 1 --max_epochs 10 training_data.bin validation_data.bin
-
--max_epochs
: number of epochs for training. One epoch is 20M positions, so choose the number of epochs according to the amount of training data. E.g., for 200M positions in thetraining_data.bin
file--max_epochs
should be 10 (or slightly above). - The
validation_data.bin
is optional. If you don't have it, simply replace it withtraning_data.bin
In order to make the trained model usable by the engine convert it to NNUE format using, e.g.,
python serialize.py logs/default/version_0/checkpoints/last.ckpt yourvariant.nnue
Make sure to select the correct checkpoint file from the run and epoch you want to convert. Now you should be able to use the NNUE in the engine.