A Python script that parses an Aklanon dictionary and converts it into several useful formats.
This parser parses an Aklanon dictionary from the book A Study of the Aklanon Dialect (Vol. 2) in Excel format and outputs it to JSON format, frequency list, and word list. Since the book is in PDF format, the dictionary is manually encoded from the book to the Excel file. The encoded data is still incomplete as it is very time consuming to manually encode. Thus, contribute if you can.
4,471 words collected (as of 08/09/2024)
Resource | Format | Link |
---|---|---|
Dictionary | json | output/akl_dictionary.json |
Frequency list | csv | output/akl_freqlist.csv |
Word list | txt | output/akl_wordlist.txt |
The JSON dictionary is structured as a list of words and its corresponding list of attributes. The attributes include part of speech, definition, etymology, classifications, synonyms, antonyms, example sentences, inflections, and sources. The entries are sorted alphabetically.
[
{
"word": "The word itself",
"attributes": [
{
"pos": "Simplified arts of speech",
"definition": "The definition",
"origin": "The etymology",
"classification": "Any classification",
"similar": [
"List of synonyms"
],
"opposite": [
"List of antonyms"
],
"examples": [
"List of example sentences that use the word"
],
"inflections": [
"List of inflected forms"
],
"sources": [
"List of sources"
]
}
]
},
]
The frequency list is structured as a list of words and its corresponding frequency value sorted from highest to lowest frequency value. Since there's no available Aklanon frequency list yet, all frequency values are set to 1.
a,1
ab-ab,1
aba,1
The word list is simply the list of words sorted alphabetically.
This project welcomes contributions and suggestions.
To specifically add entries to the Excel file by referring to the Aklanon dictionary PDF file. Please refer to this guide when doing so.
This project is licensed under the Apache License.
For more information contact maagmaandrian@gmail.com with any additional questions or comments.