diff --git a/README.md b/README.md index 770b837c3..43ef462cc 100644 --- a/README.md +++ b/README.md @@ -3,8 +3,8 @@ component-id: folk_ngram_analysis name: FONN - FOlk N-gram aNalysis description: Work-in-progress on pattern extraction and melodic similarity tools, with an associated test corpus of monophonic Irish folk tunes. type: Repository -release-date: 19/05/2022 -release-number: v0.5-dev +release-date: 15/06/2022 +release-number: v0.6-dev work-package: - WP3 licence: CC BY 4.0, https://creativecommons.org/licenses/by/4.0/ diff --git a/corpus/README.md b/corpus/README.md index 411c8f345..02f1bfa0f 100644 --- a/corpus/README.md +++ b/corpus/README.md @@ -1,10 +1,10 @@ --- component-id: cre_corpus name: Ceol Rince na hÉireann MIDI corpus -brief-description: A corpus of 1,224 monophonic instrumental Irish traditional dance tunes. +brief-description: A corpus of 1,195 monophonic instrumental Irish traditional dance tunes. type: Corpus -release-date: 8/12/2021 -release-number: v0.4-dev +release-date: 15/06/2022 +release-number: v0.6-dev work-package: - WP3 licence: CC BY 4.0, https://creativecommons.org/licenses/by/4.0/ @@ -19,19 +19,19 @@ credits: --- -## About dataset +## About the dataset **Corpus title:** _Ceol Rince na hÉireann_ -**Source:** [Black, B 2020, _The Bill Black Irish tune archive homepage_, viewed 5 January 2021.][1] +**Source:** [Black, B 2020, _The Bill Black Irish tune archive homepage_, viewed 5 January 2021.](http://www.capeirish.com/webabc) -**Contents:** 1,224 traditional Irish dance tunes, each of which is represented as a monophonic MIDI file. +**Contents:** 1,195 traditional Irish dance tunes, represented in [MIDI](https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus/MIDI) and [ABC Notation](https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus/abc). -Between 1963 and 1999, Irish State publishing companies Oifig an tSolatáthair and An Gúm issued five printed volumes of tunes from the collections of Breadán Breathnach (1912-1985) under the series title _Ceol Rince na hÉireann_ (Dance Music of Ireland, hereafter _CRÉ_). The five volumes of _CRÉ_ contain 1,208 traditional tunes, a subset of Breathnach's more extensive personal collection of 5,000+ melodies. The collection has been transcribed into ABC notation by American traditional music researcher Bill Black, and made freely available online via his [personal website][1]. Addition of alternative tune versions and variation in numbering of unique melodies has resulted in a total of 1,224 tunes in the Bill Black ABC corpus. This resource has been used in previous research work, for example it makes up part of a larger aggregated corpus used in the [_Tunepal_][2] Music Information Retrieval app. We have created a new cleaned and annotated MIDI version of the corpus, from which feature sequence data can be extracted and analysed via Polifonia's [FONN][3] music pattern analysis toolkit. +Between 1963 and 1999, Irish State publishing companies Oifig an tSolatáthair and An Gúm issued five printed volumes of tunes from the collections of Breadán Breathnach (1912-1985) under the series title _Ceol Rince na hÉireann_ (Dance Music of Ireland, hereafter _CRÉ_). The five volumes of _CRÉ_ contain 1,208 traditional tunes, a subset of Breathnach's more extensive personal collection of 5,000+ melodies. The collection has been transcribed into ABC notation by American traditional music researcher Bill Black, and made freely available online via his [personal website]((http://www.capeirish.com/webabc)). Addition of alternative tune versions and variation in numbering of unique melodies has resulted in a total of 1,224 tunes in the Bill Black ABC corpus. This resource has been used in previous research work, for example it makes up part of a larger aggregated corpus used in the [_Tunepal_](https://tunepal.org/index.html) Music Information Retrieval app. We have created a new cleaned and annotated version of the corpus, from which feature sequence data can be extracted and analysed via Polifonia's [FONN](https://github.com/polifonia-project/folk_ngram_analysis) music pattern analysis toolkit. -NOTE: Please see [corpus_stats.ipynb][11] for a Jupyter notebook exploring the corpus data. +NOTE: Please see [corpus_demo.ipynb](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/corpus/corpus_demo.ipynb) for a Jupyter notebook exploring the corpus data. -Deliverable 3.2 of the Polifonia project will describe the context and research in more detail. It will be published on [Cordis](https://cordis.europa.eu/project/id/101004746/it). +Deliverable 3.3 of the Polifonia project will describe the context and research in more detail. It will be published on [Cordis](https://cordis.europa.eu/project/id/101004746/it). ## About corpus pre-processing methodology @@ -40,8 +40,7 @@ Bill Black's ABC version of the _CRÉ_ collection has been manually edited and a * Removal of alternative tune versions, so that the ABC collection more accurately reflects the original print collection. * Removal of non-valid ABC notation characters. * Editing of repeat markers to ensure accurate MIDI output. -* Conversion to MIDI via EasyABC software. -* Manual assignment of root note (as chromatic pitch class) for every piece of music in the corpus. This data is stored in the file [roots.csv][4], which is used to derive key-invariant secondary feature sequence data from the MIDI files. +* Manual assignment of root note (as chromatic pitch class) for every piece of music in the corpus. This data is stored in [roots.csv]( https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus/roots.csv), which is used to derive key-invariant secondary feature sequence data from the MIDI files. ## Description of the data @@ -49,48 +48,46 @@ Bill Black's ABC version of the _CRÉ_ collection has been manually edited and a ``` corpus/ -MIDI/ - -1,224 monophonic MIDI files (.mid) + -1,195 monophonic MIDI (.mid) files, one representing each tune. + -abc/ + -1 ABC NOtation corpus file (.abc) containing scores for all 1,195 tunes. -roots.csv -README.md -LICENSE.md ``` -Each melody in the corpus is represented as a monophonic MIDI file, named per the melody title. There are 1,224 files in total, stored in the [./MIDI][4] directory. +- ```corpus``` directory contains roots.csv, this README.md, and a LICENSE.md file. -The [corpus][6] root directory contains a [roots.csv][5] file, this readme, and a LICENSE.md file. -Roots.csv holds two columns with one row per each MIDI file in the corpus: -'title': MIDI file title -'root': expert-assigned root note of each melody, represented as a [chromatic pitch class][7] (i.e.: An integer value from C=0 through B=11). +- Roots.csv holds two columns with one row per each MIDI file in the corpus: + - 'title': MIDI file name (tune title) + - 'root': expert-assigned root note of each melody, represented as a [chromatic pitch class](https://en.wikipedia.org/wiki/Pitch_class) (i.e.: An integer value from C=0 through B=11). image

-To extract feature sequence data from the MIDI corpus, please download the corpus data and run [setup_corpus.main()][9] from folk_ngram_analysis component. Please see [folk_ngram_analysis readme][8] for further information. +- To convert corpus form ABC Notation to MIDI format, please download the corpus data and run FONN [abc_ingest.py](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/abc_ingest.py) script. Please see [FONN README.md](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/README.md) for further information. +- To extract feature sequence data from the MIDI corpus, please download the corpus data and run FONN [setup_corpus.py](https://github.com/danDiamo/music_pattern_analysis/blob/master/setup_corpus.py) script. Please see [FONN README.md](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/README.md) for further information. + -## Online repository link
-https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus +## Attribution -## Authors +If you use the code in this repository, please cite this software as follow: +``` +@software{diamond_fonn_2022, + address = {Galway, Ireland}, + title = {{FONN} - {FOlk} {N}-gram {aNalysis}}, + shorttitle = {{FONN}}, + url = {https://github.com/polifonia-project/folk_ngram_analysis}, + publisher = {National University of Ireland, Galway}, + author = {Diamond, Danny and Shahid, Abdul and McDermott, James}, + year = {2022}, +} +``` -* Danny Diamond -* Dr. Abdul Shahid Khattak -* Dr. James McDermott -* Dr Mathieu d'Aquin +## License + +This work is licensed under CC BY 4.0, https://creativecommons.org/licenses/by/4.0/ -## License -This project is licensed under the MIT License - see [LICENSE.md][10] file for details - -[1]: http://www.capeirish.com/webabc -[2]: https://tunepal.org/index.html -[3]: https://github.com/polifonia-project/folk_ngram_analysis -[4]: https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus/MIDI -[5]: https://github.com/danDiamo/music_pattern_analysis/blob/master/corpus/roots.csv -[6]: https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus -[7]: https://en.wikipedia.org/wiki/Pitch_class -[8]: https://github.com/polifonia-project/folk_ngram_analysis/blob/master/README.md -[9]: https://github.com/danDiamo/music_pattern_analysis/blob/master/setup_corpus/setup_corpus.py -[10]: https://github.com/polifonia-project/folk_ngram_analysis/blob/master/corpus/license.md -[11]: https://github.com/polifonia-project/folk_ngram_analysis/blob/master/corpus/corpus_stats.ipynb diff --git a/root_note_detection/README.md b/root_note_detection/README.md index df4667362..fe3c3eac3 100644 --- a/root_note_detection/README.md +++ b/root_note_detection/README.md @@ -9,7 +9,7 @@ type: Repository release-date: 20/05/2022 -release-number: v0.5-dev +release-number: v0.6-dev work-package: - WP3