Chamorro Lexicon Expander is a Python project designed to expand the Chamorro-English dictionary by generating all possible affixed variations of Chamorro root words. This tool automates the process of creating word forms using common Chamorro prefixes, suffixes, and infixes according to linguistic rules. The goal is to enable a more comprehensive representation of Chamorro vocabulary for language learners and dictionary development, and to provide a labelled dataset to use in other machine learning projects. (In Progress)
Important Note: The focus of this project is testing out applying affixes algorithmically to words according to linguistic rules, to experiment with ways to expedite creating word lists with known affixes. It is also meant to create a training set for future machine learning projects, such as training a machine learning model for predicting the root word (lemma) of a given word in Chamorro. So while the words generated in this project may accurately follow linguistic rules, the resulting words may or may not reflect actual, natural speech patterns in Chamorro. Therefore, it is always important to verify with a reliable corpus and/or native speakers on word usage.
Schyuler Lujan
- Provides a dataset of Chamorro words, definitions, and part of speech tags
- Transforms words using a rules-based approach, according to linguistic rules
- Exports output into a CSV file that has the new word, original word, and original word definition
- Generates an expanded dataset of affixed words labelled with their root words and their root word definitions
Schyuler Lujan