Skip to content

Latest commit

 

History

History
27 lines (15 loc) · 1006 Bytes

README.md

File metadata and controls

27 lines (15 loc) · 1006 Bytes

This is a project to generate POS tag dictionary for Ukrainian language.

Це — проект генерування словника з тегами частин мови для української мови.

Description:

dict_uk/expand/expand_all.py -aff data/affix -dict data/dict

For all files in data/dict the project genereates all possible word forms with POS tags
by using affix rules from files in data/affix.

How to run:

# dict_uk/expand/expand_all.py -aff data/affix -dict data/dict -corp -indent -mfl -wordlist
Output:

    * dict_corp_vis.txt - Dictionary in visual (indented) format for review, analysis or conversion
    * dict_corp_lt.txt - Dictionary for LT for annotating the corpus
    * words.txt, lemmas.txt, tags.txt - list of all uniq words, lemmas and tags

# dict_uk/expand/expand_all.py -aff data/affix -dict data/dict
Output:

    * dict_rules_lt.txt - Dictionary file for LT (LanguageTool) used for grammar rules checking