rhythmform_synthetic_data_v1.0.0

This dataset comprises synthetic drum scores created using RhythmForm, where a Markov-Chain model to decide the "most likely next bar" was used. The data is intended for use in training a transformer model that would convert images or (scanned) PDFs of drum scores to editable digitised drum scores. The transformer would convert to Symbolic Music Text (SMT), which is converted to MusicXML programmatically.

150,000 scores of varying length are included as: PDF (1 per score); MusicXML (1 or 2* per score); PNG (1 per page per score); and SMT (1 per page per score) files.
logs files corresponding to 23 data synthesis runs are included
a dataset.json file is included to use the dataset to train a transformer model as described in https://github.com/DrumScoreAI/RhythmForm
vocabulary files are included: all_tokens_corpus.smt; full_tokenizer_vocab.json; markov_training_corpus.smt; merged_tokenizer_vocab.json.
a pickled Markov-chain model is included: markov_model.pkl
total number of files: 700859
Depending on a use_repeat_bars condition, either 1 MusicXML file (no repeat bars) or 2 MusicXML files (1 with repeat bars and equivalent without) are generated.

Data and Resources

This dataset has no data

Additional Info

Field	Value
Contact point	d.mckay@epcc.ed.ac.uk
Dataset privacy	Public
Landing Page	https://github.com/DrumScoreAI/RhythmForm