GitHub - frankier/STIFF-explore: Exploratory programming related to https://github.com/frankier/STIFF

Some WordNet/OpenSubtitles2018 related exploratory scripts:

detect_chinese.py - Detect usage of traditional versus simplified Chinese in OpenSubtitles2018
difficulties.py - Print "difficult" cases of traditional/simplified Chinese characters including non-idempotency and one-many relationships
confsmat.ipynb - Plot a confusion matrix of said scripts versus their classification in OpenSubtitles2018
diff.py - Test to see if tokenised and untokenised sentences can be realigned in OpenSubtitles2018
important.py - Get the most important words and multiwords in FinnWordNet as well as just the longest words
trace_wn.py - Spit out all WordNet lemmas found in STDIN
variants.py - Print information about variants of Chinese characters
verify_tab.py - Verify the correctness of a multilingual WordNet data file and possibly suggest corrections

To make the confusion matrix plot:

$ pipenv run python detect_chinese.py confsmat-analyse cmn-fin/OpenSubtitles2018/fi-zh_cn/OpenSubtitles2018.fi-zh_cn.zh_cn cmn-fin/OpenSubtitles2018/fi-zh_cn/OpenSubtitles2018.fi-zh_cn.ids cmn-fin/OpenSubtitles2018/fi-zh_tw/OpenSubtitles2018.fi-zh_tw.zh_tw cmn-fin/OpenSubtitles2018/fi-zh_tw/OpenSubtitles2018.fi-zh_tw.ids confsanalys.pkl
$ pipenv run python detect_chinese.py confsmat-cls confsanalys.pkl confsmat4.pkl

Then run confsmat.ipynb with Jupyter notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE-2.0.txt		LICENSE-2.0.txt
README.md		README.md
chinese_length.py		chinese_length.py
confsmat.ipynb		confsmat.ipynb
corpus_stats.bash		corpus_stats.bash
ctx_sizes.py		ctx_sizes.py
detect_chinese.py		detect_chinese.py
diff.py		diff.py
difficulties.py		difficulties.py
extra_requirements.txt		extra_requirements.txt
important.py		important.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
synsets.py		synsets.py
trace_wn.py		trace_wn.py
utils.py		utils.py
variants.py		variants.py
verify_tab.py		verify_tab.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

frankier/STIFF-explore

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages