Skip to content

PyThaiNLP 2.0 #180

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 358 commits into from
Mar 31, 2019
Merged

PyThaiNLP 2.0 #180

merged 358 commits into from
Mar 31, 2019

Conversation

wannaphong
Copy link
Member

@wannaphong wannaphong commented Mar 31, 2019

PyThaiNLP 2.0

Codacy Badgepypi
Build Status
Build status
Coverage Status
License

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

📖 For details on upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see From PyThaiNLP 1.7 to PyThaiNLP 2.0

📖 For ThaiNER user after upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see Upgrade ThaiNER from PyThaiNLP 1.7 to PyThaiNLP 2.0

📫 follow us on Facebook Pythainlp

What's new in version 2.0 ?

  • New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
  • Terminate Python 2 support. Remove all Python 2 compatibility code.
  • Remove old, obsolated, deprecated, and experimental code.
  • Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
  • ThaiNER 1.0
  • Remove sentiment analysis
  • Improved word_tokenize (newmm, mm) and dict_word_tokenize
  • Improved POS-tagging
  • More and improved examples
  • see PyThaiNLP 2.0 change log

Links

bact and others added 30 commits November 2, 2018 14:30
…wercase), as suggested by @wannaphongcom

- move them from pythainlp.corpus module to to pythainlp module since they are not really a corpus and are common variables to be shared by all modules
…/__init__.py

- reduce numbers of convenience imports in pythainlp/__init__.py to reduce namespace crashes/mutual top-level import crashes possibility
- Move isthai() function from pythainlp.tokenize to pythainlp.util
- Move wordtonum function from pythainlp.util to pythainlp.number
- Refactor codes related to pythainlp.util
- More test cases, sort test cases by import order
 Consistent naming and consolidate similar codes
Merge from PyThaiNLP/pythainlp
- TTC should read ttc_freq.txt (was tnc_freq.txt)
- test case for bahttext for full number without satang
- test case for pythainlp.corpus.remove
@wannaphong wannaphong added this to the 2.0 milestone Mar 31, 2019
@coveralls
Copy link

coveralls commented Mar 31, 2019

Coverage Status

Coverage increased (+28.7%) to 81.731% when pulling 4094632 on dev into ab79eab on master.

@wannaphong wannaphong merged commit a6a7717 into master Mar 31, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants