From bf0c025a150926171d861d7a620f69bfafc82673 Mon Sep 17 00:00:00 2001
From: Danny Diamond <78231894+danDiamo@users.noreply.github.com>
Date: Wed, 15 Jun 2022 21:39:03 -0500
Subject: [PATCH 1/8] Update README.md
---
corpus/README.md | 54 +++++++++++++++++-------------------------------
1 file changed, 19 insertions(+), 35 deletions(-)
diff --git a/corpus/README.md b/corpus/README.md
index 411c8f345..f84c163e2 100644
--- a/corpus/README.md
+++ b/corpus/README.md
@@ -23,15 +23,15 @@ credits:
**Corpus title:** _Ceol Rince na hÉireann_
-**Source:** [Black, B 2020, _The Bill Black Irish tune archive homepage_, viewed 5 January 2021.][1]
+**Source:** [Black, B 2020, _The Bill Black Irish tune archive homepage_, viewed 5 January 2021.](http://www.capeirish.com/webabc)
-**Contents:** 1,224 traditional Irish dance tunes, each of which is represented as a monophonic MIDI file.
+**Contents:** 1,195 traditional Irish dance tunes, represented in [MIDI](https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus/MIDI) and [ABC Notation](https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus/abc).
-Between 1963 and 1999, Irish State publishing companies Oifig an tSolatáthair and An Gúm issued five printed volumes of tunes from the collections of Breadán Breathnach (1912-1985) under the series title _Ceol Rince na hÉireann_ (Dance Music of Ireland, hereafter _CRÉ_). The five volumes of _CRÉ_ contain 1,208 traditional tunes, a subset of Breathnach's more extensive personal collection of 5,000+ melodies. The collection has been transcribed into ABC notation by American traditional music researcher Bill Black, and made freely available online via his [personal website][1]. Addition of alternative tune versions and variation in numbering of unique melodies has resulted in a total of 1,224 tunes in the Bill Black ABC corpus. This resource has been used in previous research work, for example it makes up part of a larger aggregated corpus used in the [_Tunepal_][2] Music Information Retrieval app. We have created a new cleaned and annotated MIDI version of the corpus, from which feature sequence data can be extracted and analysed via Polifonia's [FONN][3] music pattern analysis toolkit.
+Between 1963 and 1999, Irish State publishing companies Oifig an tSolatáthair and An Gúm issued five printed volumes of tunes from the collections of Breadán Breathnach (1912-1985) under the series title _Ceol Rince na hÉireann_ (Dance Music of Ireland, hereafter _CRÉ_). The five volumes of _CRÉ_ contain 1,208 traditional tunes, a subset of Breathnach's more extensive personal collection of 5,000+ melodies. The collection has been transcribed into ABC notation by American traditional music researcher Bill Black, and made freely available online via his [personal website]((http://www.capeirish.com/webabc)). Addition of alternative tune versions and variation in numbering of unique melodies has resulted in a total of 1,224 tunes in the Bill Black ABC corpus. This resource has been used in previous research work, for example it makes up part of a larger aggregated corpus used in the [_Tunepal_](https://tunepal.org/index.html) Music Information Retrieval app. We have created a new cleaned and annotated version of the corpus, from which feature sequence data can be extracted and analysed via Polifonia's [FONN](https://github.com/polifonia-project/folk_ngram_analysis) music pattern analysis toolkit.
-NOTE: Please see [corpus_stats.ipynb][11] for a Jupyter notebook exploring the corpus data.
+NOTE: Please see [corpus_demo.ipynb](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/corpus/corpus_demo.ipynb) for a Jupyter notebook exploring the corpus data.
-Deliverable 3.2 of the Polifonia project will describe the context and research in more detail. It will be published on [Cordis](https://cordis.europa.eu/project/id/101004746/it).
+Deliverable 3.3 of the Polifonia project will describe the context and research in more detail. It will be published on [Cordis](https://cordis.europa.eu/project/id/101004746/it).
## About corpus pre-processing methodology
@@ -40,8 +40,7 @@ Bill Black's ABC version of the _CRÉ_ collection has been manually edited and a
* Removal of alternative tune versions, so that the ABC collection more accurately reflects the original print collection.
* Removal of non-valid ABC notation characters.
* Editing of repeat markers to ensure accurate MIDI output.
-* Conversion to MIDI via EasyABC software.
-* Manual assignment of root note (as chromatic pitch class) for every piece of music in the corpus. This data is stored in the file [roots.csv][4], which is used to derive key-invariant secondary feature sequence data from the MIDI files.
+* Manual assignment of root note (as chromatic pitch class) for every piece of music in the corpus. This data is stored in [roots.csv]( https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus/roots.csv), which is used to derive key-invariant secondary feature sequence data from the MIDI files.
## Description of the data
@@ -49,48 +48,33 @@ Bill Black's ABC version of the _CRÉ_ collection has been manually edited and a
```
corpus/
-MIDI/
- -1,224 monophonic MIDI files (.mid)
+ -1,195 monophonic MIDI (.mid) files, one representing each tune.
+ -abc/
+ -1 ABC NOtation corpus file (.abc) containing scores for all 1,195 tunes.
-roots.csv
-README.md
-LICENSE.md
```
-Each melody in the corpus is represented as a monophonic MIDI file, named per the melody title. There are 1,224 files in total, stored in the [./MIDI][4] directory.
+- The [corpus](https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus) root directory contains a [roots.csv](https://github.com/danDiamo/music_pattern_analysis/blob/master/corpus/roots.csv) file, this README.md, and a LICENSE.md file.
-The [corpus][6] root directory contains a [roots.csv][5] file, this readme, and a LICENSE.md file.
-Roots.csv holds two columns with one row per each MIDI file in the corpus:
-'title': MIDI file title
-'root': expert-assigned root note of each melody, represented as a [chromatic pitch class][7] (i.e.: An integer value from C=0 through B=11).
+- Roots.csv holds two columns with one row per each MIDI file in the corpus:
+ -'title': MIDI file title
+ -'root': expert-assigned root note of each melody, represented as a [chromatic pitch class](https://en.wikipedia.org/wiki/Pitch_class) (i.e.: An integer value from C=0 through B=11).
-To extract feature sequence data from the MIDI corpus, please download the corpus data and run [setup_corpus.main()][9] from folk_ngram_analysis component. Please see [folk_ngram_analysis readme][8] for further information.
+- To convert corpus form ABC Notation to MIDI format, please download the corpus data and run FONN [abc_ingest.py](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/abc_ingest.py) script. Please see [FONN README.md](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/README.md) for further information.
+- To extract feature sequence data from the MIDI corpus, please download the corpus data and run FONN [setup_corpus.py](https://github.com/danDiamo/music_pattern_analysis/blob/master/setup_corpus.py) script. Please see [FONN README.md](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/README.md) for further information.
-## Online repository link
-https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus
-## Authors
+## License
+This project is licensed under the MIT License - see [LICENSE.md](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/corpus/license.md) file for details
+
-* Danny Diamond
-* Dr. Abdul Shahid Khattak
-* Dr. James McDermott
-* Dr Mathieu d'Aquin
+## Attribution
-## License
-This project is licensed under the MIT License - see [LICENSE.md][10] file for details
-
-[1]: http://www.capeirish.com/webabc
-[2]: https://tunepal.org/index.html
-[3]: https://github.com/polifonia-project/folk_ngram_analysis
-[4]: https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus/MIDI
-[5]: https://github.com/danDiamo/music_pattern_analysis/blob/master/corpus/roots.csv
-[6]: https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus
-[7]: https://en.wikipedia.org/wiki/Pitch_class
-[8]: https://github.com/polifonia-project/folk_ngram_analysis/blob/master/README.md
-[9]: https://github.com/danDiamo/music_pattern_analysis/blob/master/setup_corpus/setup_corpus.py
-[10]: https://github.com/polifonia-project/folk_ngram_analysis/blob/master/corpus/license.md
-[11]: https://github.com/polifonia-project/folk_ngram_analysis/blob/master/corpus/corpus_stats.ipynb
From 8a95a733831c94c9b8bea413931252d5fcb0e5c2 Mon Sep 17 00:00:00 2001
From: Danny Diamond <78231894+danDiamo@users.noreply.github.com>
Date: Wed, 15 Jun 2022 21:41:08 -0500
Subject: [PATCH 2/8] Update README.md
---
corpus/README.md | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/corpus/README.md b/corpus/README.md
index f84c163e2..6764f50e2 100644
--- a/corpus/README.md
+++ b/corpus/README.md
@@ -19,7 +19,7 @@ credits:
---
-## About dataset
+## About the dataset
**Corpus title:** _Ceol Rince na hÉireann_
@@ -69,12 +69,25 @@ corpus/
- To convert corpus form ABC Notation to MIDI format, please download the corpus data and run FONN [abc_ingest.py](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/abc_ingest.py) script. Please see [FONN README.md](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/README.md) for further information.
- To extract feature sequence data from the MIDI corpus, please download the corpus data and run FONN [setup_corpus.py](https://github.com/danDiamo/music_pattern_analysis/blob/master/setup_corpus.py) script. Please see [FONN README.md](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/README.md) for further information.
+
+## Attribution
+
+If you use the code in this repository, please cite this software as follow:
+```
+@software{diamond_fonn_2022,
+ address = {Galway, Ireland},
+ title = {{FONN} - {FOlk} {N}-gram {aNalysis}},
+ shorttitle = {{FONN}},
+ url = {https://github.com/polifonia-project/folk_ngram_analysis},
+ publisher = {National University of Ireland, Galway},
+ author = {Diamond, Danny and Shahid, Abdul and McDermott, James},
+ year = {2022},
+}
+```
## License
-This project is licensed under the MIT License - see [LICENSE.md](https://github.com/polifonia-project/folk_ngram_analysis/blob/master/corpus/license.md) file for details
-
-## Attribution
+This work is licensed under CC BY 4.0, https://creativecommons.org/licenses/by/4.0/
From 271105a8dd8c8982f1949cfdc4383c04ebbdda99 Mon Sep 17 00:00:00 2001
From: Danny Diamond <78231894+danDiamo@users.noreply.github.com>
Date: Wed, 15 Jun 2022 21:42:26 -0500
Subject: [PATCH 3/8] Update README.md
---
corpus/README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/corpus/README.md b/corpus/README.md
index 6764f50e2..89b3f28d7 100644
--- a/corpus/README.md
+++ b/corpus/README.md
@@ -57,7 +57,7 @@ corpus/
```
-- The [corpus](https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus) root directory contains a [roots.csv](https://github.com/danDiamo/music_pattern_analysis/blob/master/corpus/roots.csv) file, this README.md, and a LICENSE.md file.
+- The [corpus](https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus) root directory contains roots.csv, this README.md, and a LICENSE.md file.
- Roots.csv holds two columns with one row per each MIDI file in the corpus:
-'title': MIDI file title
From 2f883044b095e1b2cb6d27d85e358b0b88a67068 Mon Sep 17 00:00:00 2001
From: Danny Diamond <78231894+danDiamo@users.noreply.github.com>
Date: Wed, 15 Jun 2022 21:43:15 -0500
Subject: [PATCH 4/8] Update README.md
---
corpus/README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/corpus/README.md b/corpus/README.md
index 89b3f28d7..7e8c98f5b 100644
--- a/corpus/README.md
+++ b/corpus/README.md
@@ -57,7 +57,7 @@ corpus/
```
-- The [corpus](https://github.com/polifonia-project/folk_ngram_analysis/tree/master/corpus) root directory contains roots.csv, this README.md, and a LICENSE.md file.
+- ```corpus``` directory contains roots.csv, this README.md, and a LICENSE.md file.
- Roots.csv holds two columns with one row per each MIDI file in the corpus:
-'title': MIDI file title
From 9ad68995b188de32b980caa2d2110621e099f6d5 Mon Sep 17 00:00:00 2001
From: Danny Diamond <78231894+danDiamo@users.noreply.github.com>
Date: Wed, 15 Jun 2022 21:43:57 -0500
Subject: [PATCH 5/8] Update README.md
---
corpus/README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/corpus/README.md b/corpus/README.md
index 7e8c98f5b..e86bca756 100644
--- a/corpus/README.md
+++ b/corpus/README.md
@@ -60,8 +60,8 @@ corpus/
- ```corpus``` directory contains roots.csv, this README.md, and a LICENSE.md file.
- Roots.csv holds two columns with one row per each MIDI file in the corpus:
- -'title': MIDI file title
- -'root': expert-assigned root note of each melody, represented as a [chromatic pitch class](https://en.wikipedia.org/wiki/Pitch_class) (i.e.: An integer value from C=0 through B=11).
+ - 'title': MIDI file name (tune title)
+ - 'root': expert-assigned root note of each melody, represented as a [chromatic pitch class](https://en.wikipedia.org/wiki/Pitch_class) (i.e.: An integer value from C=0 through B=11).
From 29f8326db4ab169a4543951b98f71a7394a7c25d Mon Sep 17 00:00:00 2001
From: Danny Diamond <78231894+danDiamo@users.noreply.github.com>
Date: Wed, 15 Jun 2022 21:51:40 -0500
Subject: [PATCH 6/8] Update README.md
---
README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index 770b837c3..43ef462cc 100644
--- a/README.md
+++ b/README.md
@@ -3,8 +3,8 @@ component-id: folk_ngram_analysis
name: FONN - FOlk N-gram aNalysis
description: Work-in-progress on pattern extraction and melodic similarity tools, with an associated test corpus of monophonic Irish folk tunes.
type: Repository
-release-date: 19/05/2022
-release-number: v0.5-dev
+release-date: 15/06/2022
+release-number: v0.6-dev
work-package:
- WP3
licence: CC BY 4.0, https://creativecommons.org/licenses/by/4.0/
From e9a3295314c5758ee5adf01832fb6886c4441a80 Mon Sep 17 00:00:00 2001
From: Danny Diamond <78231894+danDiamo@users.noreply.github.com>
Date: Wed, 15 Jun 2022 21:52:01 -0500
Subject: [PATCH 7/8] Update README.md
---
root_note_detection/README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/root_note_detection/README.md b/root_note_detection/README.md
index df4667362..fe3c3eac3 100644
--- a/root_note_detection/README.md
+++ b/root_note_detection/README.md
@@ -9,7 +9,7 @@ type: Repository
release-date: 20/05/2022
-release-number: v0.5-dev
+release-number: v0.6-dev
work-package:
- WP3
From ae619330898190aef6df80203e117b225dd7754e Mon Sep 17 00:00:00 2001
From: Danny Diamond <78231894+danDiamo@users.noreply.github.com>
Date: Wed, 15 Jun 2022 21:52:51 -0500
Subject: [PATCH 8/8] Update README.md
---
corpus/README.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/corpus/README.md b/corpus/README.md
index e86bca756..02f1bfa0f 100644
--- a/corpus/README.md
+++ b/corpus/README.md
@@ -1,10 +1,10 @@
---
component-id: cre_corpus
name: Ceol Rince na hÉireann MIDI corpus
-brief-description: A corpus of 1,224 monophonic instrumental Irish traditional dance tunes.
+brief-description: A corpus of 1,195 monophonic instrumental Irish traditional dance tunes.
type: Corpus
-release-date: 8/12/2021
-release-number: v0.4-dev
+release-date: 15/06/2022
+release-number: v0.6-dev
work-package:
- WP3
licence: CC BY 4.0, https://creativecommons.org/licenses/by/4.0/