Skip to content

plassma/symbolic-music-discrete-diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCHmUBERT

implementation of absorbing state diffusion model from https://github.com/samb-t/unleashing-transformers

Samples

Samples in MIDI format can be found in the samples folder. You can also explore them in your browser (open in new tab if page not found)

Installation

I run my experiments in Python 3.10, with all dependencies managed by Conda.

conda env create -f env.yml

Note that for all experiments, a soundfont-file called 'soundfont.sf2' (not included) must be located in the root-directory of the project.

Prepare Dataset

I use the Lakh MIDI Dataset to train the models. For loading, preprocessing and extracting melodies and trios from the MIDI files, I adapted the pipelines magenta implemented for their MusciVAE. To prepare the dataset run:

python prepare_data.py --root_dir=/path/to/lmd_full --target data/lakh_trio.npy --mode trio --bars 64

Train

I use visdom to log the training progress and periodically show samples.

To train the model, start visdom and run for example:

python train.py --dataset data/lakh_trio.npy --bars 64 --batch_size 64 --tracks trio --model conv_transformer

So far, I got the best results with the conv_transformer model with one 1DConvolutional layer with a width of 4. Pay attention to the steps_per_eval param, which is set to 10000 per default. The evaluation step is more computationally expensive than training for 10000 steps, which is why you might want to increase this value if you do not need that many evaluations.

Evaluate

To evaluate the framewise self-similarity metric on the samples generated by a model, run:

python evaluate.py --mode unconditional|infilling|self

Sample

For sampling, I implemented hacked a rudimentary GUI using nicegui.

python sample.py --load_step 140000 --bars 64 --tracks trio --model conv_transformer

The GUI supports:

  • visualizing samples (melody=red, bass=blue, drums=black), y position indicated pitch height, special pitch values: 0: pause, 1: note off, 90: mask
  • adaption of sample steps (Slider in Upload Expansion area)
  • diffuse from left to right ('=>') or vice versa ('<=')
  • copy from left to right ('>') or vice versa, only mask values are overwritten
  • sampling unconditionally (select 'A' in the central toggle to diffuse All (batch of 8) instead of the Selected sample)
  • uploading midi or musicxml - pieces for conditioning
  • masking whole tracks LM = Left Melody, RD = Right Drums, ....
  • masking area selected with mouse (mask button at the bottom)
  • playing with cursor indicating exact position in left and right visualization

Model Weights

Model weights for the Conv_Transformer EMA model trained on the Lakh-MIDI Dataset can be obtained here. Extract the 'logs' folder to the project root, and set load_step, model, ... accordingly (250000, conv_transformer, ...).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published