This project implements a Transformer-based model for generating embeddings from MIDI files, focusing on learning meaningful representations of musical pieces.
The main.py
script serves as the primary entry point for training and evaluating a MIDI embedding model. When a user runs this script, the following will happen:
-
Configuration Setup
- A configuration dictionary is created with hyperparameters for the model and training process
- Parameters include sequence length, embedding dimensions, attention heads, model layers, batch size, epochs, learning rate, and dropout rate
-
Dataset Preparation
- Loads three datasets:
- Training dataset
- Validation dataset
- Test dataset
- Uses
MIDIDatasetPresaved
andMIDIDatasetDynamic
for efficient data handling - Tokenizes MIDI files using a pre-trained tokenizer if available, otherwise trains a new one
- Loads three datasets:
-
Model Initialization
- Creates a
MIDITransformerEncoder
with the specified configuration
- Creates a
-
Model Training
- Trains the model using the training dataset
- Validates performance on the validation dataset
- Saves the best-performing model checkpoint
-
Model Evaluation
- Evaluates the trained model on the test dataset
- Prints out the test loss and perplexity metrics
-
Embedding Visualization
- Generates and saves an interactive HTML visualization of embeddings for all the songs in the MAESTRO-sustain-v2 dataset using t-SNE
python main.py
pip install requirements.txt
transformer.py
: Defines the MIDI Transformer Encoder architecturedataset.py
: Handles MIDI dataset loading and preprocessingtrain.py
: Contains training and evaluation functionsvisualize.py
: Provides embedding visualization functions
Users can modify the configuration dictionary in main.py
to experiment with different hyperparameters, such as:
- Embedding dimensions
- Number of attention heads
- Number of model layers
- Learning rate
- Batch size
- Dropout rate
- Sequence length
- Number of epochs
The script generates an interactive HTML visualization of song embeddings, allowing users to explore how different musical pieces are represented in the embedding space.