Skip to content

Latest commit

 

History

History
28 lines (20 loc) · 2.36 KB

README.md

File metadata and controls

28 lines (20 loc) · 2.36 KB

Phonological Edit Distance

This uses the mighty power of Phonological Corpus Tools to calculate the average phonological edit distance of all items in a set. This can be used as a proxy measure of phonological dissimilarity.

Dependencies:

To run this you must have:

Note that you must have PCT version 1.1.0 to run this, and supply your own feature matrix. Later versions of PCT crash with this code. Feel free to message me if you encounter difficulties.


About:

The (Levenshtein) edit distance is the number of operations (i.e. add, delete, replace) needed to change one string to another. For example 'bat'->'pat' has an edit distance of 1. But some changes may more phonologically different than others. For example, 'bat'->'rat' differs in more phonological features than 'bat'->'pat'. The phonological edit distance takes the levenshtein edit distance and weights it based off the difference in phonological features. More info can be found here

How to use:

Right now there's no pretty input or output methods because I'm lazy, but if you feel like adding them in let me know. With that out of the way...
phonoEditDistanceWITHINsubjects.py compares a set of words to itself, and phonoEditDistanceBETWEENsubjects.py compares a set of words to another set of words.

Using phonoEditDistanceWITHINsubjects.py:

  • Replace "myCorpus.csv" with the corpus of your choice. (Note: make sure it's formatted properly)
  • Open a terminal window and type sudo python3 phonoEditDistanceWITHINsubjects.py. (Note: may not need sudo, or to specify python3 if it's the only version you have installed).

Using phonoEditDistanceWITHINsubjects.py:

  • Replace "corpusA.csv" and "corpusB.csv" with the corpora you wish to compare.
  • Open a terminal window and type sudo python3 phonoEditDistanceBETWEENsubjects.py. (Note: may not need sudo, or to specify python3 if it's the only version you have installed).