Skip to content

PyThaiNLP 2.0 #180

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 358 commits into from
Mar 31, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
358 commits
Select commit Hold shift + click to select a range
9d9c551
update test cases
bact Nov 2, 2018
34937b7
Simplify path query
bact Nov 2, 2018
c0ae86e
- THAI_TONEMARKS instead of ["'", ...]
bact Nov 2, 2018
e714bb5
commented out test_sentiment() test for now, require torch installation
bact Nov 2, 2018
4c5a3f3
fix THAI_TONEMARKS values
bact Nov 2, 2018
0cefe0f
Update README.md
wannaphong Nov 2, 2018
d364c40
move import position
bact Nov 2, 2018
0eb1299
add test case
bact Nov 3, 2018
8d90a82
- rename THAI_ALPHABETS, etc. (uppercase) to thai_alphabets, etc. (lo…
bact Nov 3, 2018
f75f369
recategorizes thai characters
bact Nov 3, 2018
44a3ec9
- shuffle the order of imports and constant declarations in pythainlp…
bact Nov 3, 2018
6b78a35
- Move change.py functions to pythainlp.util
bact Nov 3, 2018
9f58a35
Update doc
bact Nov 3, 2018
11e60c9
Fix tnc.word_freqs() - what if server fails
bact Nov 3, 2018
a2c5d35
Update countries_th.txt
bact Nov 3, 2018
f060442
Merge pull request #146 from bact/dev
bact Nov 4, 2018
b15b3a3
Merge pull request #12 from PyThaiNLP/dev
bact Nov 4, 2018
02d7074
Update README.md
bact Nov 4, 2018
f895692
Update README.md
bact Nov 4, 2018
aca5ce8
Update README.md
bact Nov 4, 2018
734b5b5
Update README.md
bact Nov 4, 2018
d2c93fc
Update README.md
bact Nov 4, 2018
7ce7400
Update README.md
bact Nov 4, 2018
1362390
Update README.md
bact Nov 4, 2018
b6e23f6
Update README.md
bact Nov 4, 2018
01a6ec0
Merge pull request #14 from PyThaiNLP/dev
bact Nov 4, 2018
5274d89
Minor bug fixes + add test cases
bact Nov 4, 2018
48d307f
Update readme
bact Nov 4, 2018
5145d37
- more test cases for bahttext, num_to_thaiword, thaiword_to_num
bact Nov 4, 2018
cc601f1
make it one return (try to reduce cognitive complexity)
bact Nov 4, 2018
2d04829
Merge pull request #147 from bact/dev
bact Nov 4, 2018
b7ff45e
- simplify bahttext code, remove unused code
bact Nov 4, 2018
3a0cc37
Merge pull request #148 from bact/dev
bact Nov 4, 2018
ace9aee
Number converters
bact Nov 5, 2018
978295f
Merge pull request #149 from bact/dev
bact Nov 5, 2018
27b7732
remove WordNet wrapper (pythainlp/corpurs/wordnet.py) - the entire se…
bact Nov 5, 2018
32a718e
Merge pull request #15 from PyThaiNLP/dev
bact Nov 5, 2018
4c68a09
Add link to master's README
bact Nov 6, 2018
7b3d077
- rank() handles empty list or None case
bact Nov 6, 2018
7b18822
more test case
bact Nov 6, 2018
c6edbf4
Soundex functions handle empty or None case
bact Nov 6, 2018
a090448
check length before accessing text index
bact Nov 6, 2018
f4ffbbd
Handle empty text
bact Nov 6, 2018
9ab4d94
remove if __name__ == "__main__"
bact Nov 6, 2018
3ed552e
check str length
bact Nov 6, 2018
f08e674
Update README.md
bact Nov 6, 2018
042591d
Update README.md
bact Nov 6, 2018
580ea00
Merge pull request #151 from bact/dev
bact Nov 6, 2018
6717b06
- Propose to merge g2p and romanization modules to one common transli…
bact Nov 6, 2018
a309945
Merge pull request #16 from PyThaiNLP/dev
bact Nov 6, 2018
bcf7980
Update setup.py
wannaphong Nov 6, 2018
9707c67
fix imports
bact Nov 6, 2018
adf2d65
Merge branch 'dev' of https://github.com/bact/pythainlp into dev
bact Nov 6, 2018
529d657
install epitrain for Windows test in AppVeyor
bact Nov 6, 2018
b67465a
- remove try imports
bact Nov 9, 2018
baa85ac
torch on Python 3.4 does not has PyPI package
bact Nov 9, 2018
806ad54
remove [ml] from install
bact Nov 9, 2018
b31bc70
add "ner" to extras_require
bact Nov 9, 2018
5faabef
Update test cases
bact Nov 9, 2018
02a05f6
Update version number
bact Nov 9, 2018
75e8971
Fix version syntax
bact Nov 9, 2018
f862690
update docs for transliterate
bact Nov 9, 2018
a47d297
- add "full" option in extras_require
bact Nov 9, 2018
694dbf7
Merge pull request #153 from bact/dev
bact Nov 9, 2018
0780c38
More test cases
bact Nov 9, 2018
16ddc42
Merge pull request #17 from PyThaiNLP/dev
bact Nov 9, 2018
6121148
Add English test cases
bact Nov 9, 2018
5577499
more test cases for spellchecker
bact Nov 9, 2018
a7689ab
more wordnet test cases
bact Nov 9, 2018
7a1f4e4
more romanize() (royin) test cases
bact Nov 9, 2018
65b16ca
handles None and empty cases
bact Nov 9, 2018
03dbcc0
- handles None and empty cases
bact Nov 9, 2018
7190aba
fix test cases
bact Nov 9, 2018
69cb3a7
handles None and empty cases in pos taggers
bact Nov 9, 2018
5292715
remove artagger tests for now
bact Nov 9, 2018
e738142
more test cases for tokenization
bact Nov 9, 2018
985fcf9
- adjust extras_require
bact Nov 9, 2018
9496571
fix tcc_gen() test
bact Nov 9, 2018
a145a2b
thai2vec: load model only once
bact Nov 9, 2018
cd37a04
thai2vec test cases + more wordnet test cases
bact Nov 10, 2018
afb1066
workaround to make boto work on Travis CI
bact Nov 10, 2018
823f702
fix royin romanize(), bring back portion of old code from 5e44053 (fo…
bact Nov 10, 2018
ec25189
Add doc on extras_require
bact Nov 10, 2018
af83c4d
update README
bact Nov 10, 2018
fed1ceb
Merge pull request #156 from bact/dev
bact Nov 10, 2018
ccf5ff0
Merge pull request #18 from PyThaiNLP/dev
bact Nov 10, 2018
7c5bba1
- add unit test for artagger (pos tagger)
bact Nov 11, 2018
0c17174
simplify code
bact Nov 11, 2018
917e357
- add/fix artagger test cases
bact Nov 12, 2018
8ab13d2
Merge pull request #157 from bact/dev
bact Nov 13, 2018
63206ab
Update README.md
bact Nov 14, 2018
a496593
more description about Thai consonants, letters, characters
bact Nov 14, 2018
501e75f
Update README.md
bact Nov 14, 2018
0240fae
update thai_consonants
bact Nov 14, 2018
fc33f9f
Merge pull request #158 from bact/dev
bact Nov 14, 2018
b3ea0ea
Add license scan report and status
fossabot Nov 14, 2018
2509360
Merge pull request #159 from fossabot/dev
wannaphong Nov 14, 2018
1da39b2
Update README.md
wannaphong Nov 15, 2018
752582e
Merge pull request #19 from PyThaiNLP/dev
bact Nov 17, 2018
a89ca87
Rearrange packages + thai_strftime
bact Nov 18, 2018
92e3cb2
fix import
bact Nov 18, 2018
d0ca1b9
fix import, specific import the pos tagger
bact Nov 18, 2018
03de4e5
ThaiNameRecognizer -> ThaiNameTagger
bact Nov 19, 2018
27d5104
trying to fix find_keyword()
bact Nov 19, 2018
b8ec278
Update __init__.py
wannaphong Nov 19, 2018
0a8a60d
thai2fit v0.3
Nov 20, 2018
a476a2d
Merge branch 'dev' of https://github.com/PyThaiNLP/pythainlp into dev
Nov 20, 2018
e4b3be6
Update __init__.py
wannaphong Nov 20, 2018
5f78d54
Merge branch 'dev' into dev
bact Nov 20, 2018
f9cda96
update new test to reflexs update in pythainlp.sentiment module removal
bact Nov 20, 2018
c8e771c
strftime platform-agnostic test case for now
bact Nov 20, 2018
e05f97f
remove pythainlp.sentiment references
bact Nov 20, 2018
934bfdc
remove thai2vec from test case (no longer exist)
bact Nov 20, 2018
364996f
update rank() function
bact Nov 20, 2018
9a2c05c
update test cases for NER
bact Nov 20, 2018
0a0a2f7
update rank()
bact Nov 20, 2018
3bf41dc
Word vector test cases
bact Nov 21, 2018
9aa5a88
add lm testers
Nov 21, 2018
7ef637f
Merge branch 'dev' of https://github.com/PyThaiNLP/pythainlp into dev
Nov 21, 2018
4958910
add documentation ulmfit
Nov 21, 2018
c89bb0d
fix tokenizer bug
Nov 21, 2018
0e11673
Merge branch 'dev' into dev
bact Nov 21, 2018
f9888bd
Merge pull request #160 from bact/dev
bact Nov 21, 2018
6903627
add lstm models
Nov 22, 2018
6c543bb
import from fastai.text
Nov 22, 2018
193ac03
forgot Tokenizer from fastai.text
Nov 22, 2018
ff259d5
import helpers for merge_wgts
Nov 22, 2018
2bde59a
typo embed->emb
Nov 22, 2018
175f000
update imports for thai2fit
bact Nov 22, 2018
4bc52b7
Merge branch 'dev' into dev
bact Nov 23, 2018
dac6f3a
Merge pull request #20 from PyThaiNLP/dev
bact Nov 23, 2018
7d0dacc
# TODO: Check extras and decide to download additional data, like mod…
bact Nov 23, 2018
c0dd5ed
add sentiment analysis example with ulmfit
Nov 24, 2018
f178c74
Update Tokenizer argument `pre_rules` to `rules`
cstorm125 Nov 24, 2018
53e8778
Update Tokenizer argument `rules` to `pre_rules`
cstorm125 Nov 24, 2018
d50c1af
support fastai 1.0.19; prevent any more effect of API changes
Nov 24, 2018
1a0c01c
Merge branch 'dev' of https://github.com/PyThaiNLP/pythainlp into dev
Nov 24, 2018
41d2cb6
Merge pull request #21 from PyThaiNLP/dev
bact Nov 24, 2018
1448a8b
update corpus code
bact Nov 24, 2018
e480567
Merge branch 'dev' of https://github.com/bact/pythainlp into dev
bact Nov 24, 2018
66ca827
support fastai 1.0.20 only
Nov 24, 2018
9ed48d5
rules -> pre_rules
bact Nov 24, 2018
efe8f8f
move notebooks to notebooks
bact Nov 24, 2018
cf27dbb
revert python version
bact Nov 24, 2018
0abd8b1
add ulmfit feature extraction example
Nov 24, 2018
364a14a
Specify support for fastai 1.0.22
cstorm125 Nov 24, 2018
207f3a7
Merge branch 'dev' of https://github.com/PyThaiNLP/pythainlp into PyT…
bact Nov 25, 2018
dc1a58e
move notebooks to notebooks/ folder
bact Nov 25, 2018
2abdbf5
Merge branch 'PyThaiNLP-dev' into dev
bact Nov 25, 2018
1e6fd36
placeholder
bact Nov 25, 2018
90c9ebc
import CORPUS_PATH -> import corpus_path
bact Nov 25, 2018
d5a8543
update setup.py, fastai=1.0.22
bact Nov 25, 2018
9ef8e35
Merge pull request #162 from bact/dev
bact Nov 25, 2018
5432c64
Update README.md
bact Nov 25, 2018
5405bcd
Add python_requires
wannaphong Nov 25, 2018
4ae2bdc
Requires Python 3.6+
wannaphong Nov 25, 2018
c8ec316
close import ner
wannaphong Dec 1, 2018
f4b8055
Update word_vector.rst
wannaphong Dec 17, 2018
8bc129e
add used pythainlp from cmd
wannaphong Dec 24, 2018
373824b
add postag
wannaphong Dec 24, 2018
35b5bf4
add soundex
wannaphong Dec 24, 2018
6df4e59
add corpus
wannaphong Dec 24, 2018
db0fa8b
add #!python3
wannaphong Dec 24, 2018
1823b38
update pythainlp
wannaphong Dec 24, 2018
7c90da5
Update __init__.py
wannaphong Dec 26, 2018
c1878b2
Update __init__.py
wannaphong Dec 26, 2018
f39c888
Merge pull request #165 from PyThaiNLP/Command
wannaphong Dec 26, 2018
272fea1
fixed import ner
wannaphong Dec 27, 2018
7aeabc6
fix ner bug
wannaphong Dec 27, 2018
e64f524
fix ner bug
wannaphong Dec 27, 2018
6704616
Update __init__.py
wannaphong Dec 27, 2018
6efd920
Merge pull request #169 from PyThaiNLP/wannaphongcom-patch-2
wannaphong Dec 27, 2018
e5711fb
remove pythainlp.tools.install_package
wannaphong Dec 28, 2018
7cb3326
add more test and del unused code
wannaphong Dec 28, 2018
0efd98a
add more test and add pos_tag_sents docs
wannaphong Dec 28, 2018
4b90519
add more test
wannaphong Dec 28, 2018
eeca120
add pythainlp.corpus docs
wannaphong Dec 28, 2018
5bee63c
add more api docs
wannaphong Dec 28, 2018
2f24f30
add deepcut test
wannaphong Dec 28, 2018
97a4796
add draft test_ulmfit
wannaphong Dec 28, 2018
db313e3
fix appveyor.yml and 1.8.0 -> 2.0
wannaphong Dec 28, 2018
bf87e0f
add thaicheck #171
wannaphong Jan 5, 2019
0380ca2
update thaicheck
wannaphong Jan 5, 2019
160b706
run deepcut test
wannaphong Jan 5, 2019
832d592
Update README.md
wannaphong Jan 12, 2019
dbb773f
add Thaicheck test
wannaphong Jan 12, 2019
06b22b9
Update README.md
bact Jan 23, 2019
1ef6774
Update README.md
bact Jan 23, 2019
388bbef
Merge remote-tracking branch 'origin/dev' into dev
wannaphong Jan 26, 2019
c62c54a
update syllables_th.txt
wannaphong Jan 26, 2019
c12d486
Create install_pythainlp.bat
wannaphong Jan 28, 2019
2579983
Update pythainlp-1-6-thai.md
wannaphong Feb 1, 2019
00c7946
Update appveyor.yml
wannaphong Feb 9, 2019
714225f
Update appveyor.yml
wannaphong Feb 9, 2019
f1f7872
Update appveyor.yml
wannaphong Feb 9, 2019
97ff7a0
Update appveyor.yml
wannaphong Feb 9, 2019
1b8998e
Update appveyor.yml
wannaphong Feb 9, 2019
6d9df58
Update appveyor.yml
wannaphong Feb 9, 2019
5353538
del ว and วว
wannaphong Feb 23, 2019
6619978
del one char from words_th
wannaphong Feb 23, 2019
55aa12d
delete unused code and update doc for etcc
wannaphong Feb 23, 2019
15a3a40
update code style
wannaphong Feb 23, 2019
14c8380
update tnc & ttc download to new api and delete old file
wannaphong Feb 23, 2019
c575b76
fix import tnc
wannaphong Feb 23, 2019
f79ecd6
thai2fit 0.31; fastai>=1.0.38
cstorm125 Feb 24, 2019
c069c89
Merge branch 'dev' of https://github.com/PyThaiNLP/pythainlp into dev
cstorm125 Feb 24, 2019
d7ed268
include pre_ and post_rules_th for ulmfit
cstorm125 Feb 25, 2019
c58b4d9
import pretrained wiki paths
cstorm125 Feb 25, 2019
027f6a4
change _THWIKI_LSTM to fastai 1.0.38
cstorm125 Feb 25, 2019
c92dc2e
Update __init__.py
wannaphong Feb 25, 2019
eb31730
Update __init__.py
wannaphong Feb 25, 2019
8eb0ccc
Set POS tagger #174
wannaphong Feb 26, 2019
2f7f96d
replace classification notebook as of thai2fit v0.31
cstorm125 Feb 26, 2019
7925f2c
smoother run in colab
cstorm125 Feb 26, 2019
4e7fc22
add text generation example
cstorm125 Feb 27, 2019
dbcb780
update wongnai score
cstorm125 Feb 27, 2019
bea57ab
edit processing rules
cstorm125 Mar 8, 2019
92fbc43
thai2fit 0.32
cstorm125 Mar 9, 2019
7c44898
fix #176
wannaphong Mar 9, 2019
e45e9c6
Add emoji in thai2fit
wannaphong Mar 9, 2019
8bd6c51
fix romanize bug
wannaphong Mar 9, 2019
90dcf21
fix royin bug
wannaphong Mar 9, 2019
4a76b94
Add Command Line Docs
wannaphong Mar 17, 2019
37e2ef9
Add tnc_freq.txt
wannaphong Mar 17, 2019
1b92f22
fix path tnc
wannaphong Mar 17, 2019
42b7ed2
fix import tnc
wannaphong Mar 18, 2019
d90060a
Add more corpus docs
wannaphong Mar 18, 2019
c6a3ada
Add more spell docs
wannaphong Mar 18, 2019
6f51bec
update docs and add ner example
wannaphong Mar 19, 2019
8dabb7a
Add Tokenizer test
wannaphong Mar 19, 2019
f4a5fed
add metasound test
wannaphong Mar 19, 2019
49673ae
update pythainlp.util docs
wannaphong Mar 22, 2019
ba6a517
update pythainlp.ulmfit docs
wannaphong Mar 22, 2019
e917f3e
update docs
wannaphong Mar 23, 2019
c85b7bc
Update PyThaiNLP 1.7 to PyThaiNLP 2.0 Docs
wannaphong Mar 23, 2019
c4f93a2
fix import pythainlp
wannaphong Mar 23, 2019
2d10472
update name sklearn-crfsuite
wannaphong Mar 24, 2019
f904279
update postag
wannaphong Mar 24, 2019
83bd035
Update _CORPUS_DB_URL
wannaphong Mar 24, 2019
166b671
Add Orchid to Universal Dependencies
wannaphong Mar 25, 2019
bf274c0
Update named_entity.py
wannaphong Mar 27, 2019
566f493
fix NER
wannaphong Mar 27, 2019
23e3231
ThaiNER 1.0
wannaphong Mar 29, 2019
3cfc609
Update README.md
wannaphong Mar 29, 2019
fa60612
update notebooks for thai2fit 0.32
cstorm125 Mar 29, 2019
8a9acdc
Update setup.py
wannaphong Mar 29, 2019
5ac8017
Update Command Line
wannaphong Mar 30, 2019
b73f936
Update README and setup.py
wannaphong Mar 31, 2019
5258e49
PyThaiNLP 2.0
wannaphong Mar 31, 2019
4094632
Update README
wannaphong Mar 31, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ var/
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

Expand Down Expand Up @@ -58,6 +58,11 @@ target/

# Jupyter Notebook
.ipynb_checkpoints
Untitled*.ipynb

# IDE files
.idea
.vscode

# macOS generated files
.DS_Store
Expand All @@ -66,6 +71,8 @@ target/
.Spotlight-V100
.Trashes

# Document generator temporary files
docs/_build/

\.idea/codeStyles/

Expand Down
11 changes: 8 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,17 @@

language: python
python:
- "3.4"
- "3.5"
- "3.6"

# workaround to make boto work on travis
# from https://github.com/travis-ci/travis-ci/issues/7940
before_install:
- sudo rm -f /etc/boto.cfg

# command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
install:
- pip install -r requirements-travis.txt
- pip install -r requirements.txt
- pip install .[artagger,icu,ipa,ner,thai2fit,deepcut]
- pip install coveralls

os:
Expand Down
13 changes: 7 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,22 @@ We use the famous [gitflow](http://nvie.com/posts/a-successful-git-branching-mod
- Write tests for your new features (please see "Tests" topic below);
- Always remember that [commented code is dead
code](http://www.codinghorror.com/blog/2008/07/coding-without-comments.html);
- Name identifiers (variables, classes, functions, module names) with readable
names (`x` is always wrong);
- Name identifiers (variables, classes, functions, module names) with meaningful
and pronounceable names (`x` is always wrong);
- When manipulating strings, use [Python's new-style
formatting](http://docs.python.org/library/string.html#format-string-syntax)
(`'{} = {}'.format(a, b)` instead of `'%s = %s' % (a, b)`);
- All `#TODO` comments should be turned into issues (use our
[GitHub issue system](tps://github.com/wannaphongcom/pythainlp/));
[GitHub issue system](https://github.com/PyThaiNLP/pythainlp/));
- Run all tests before pushing (just execute `tox`) so you will know if your
changes broke something;
- All source code and all text files should be ended with one empty line. This is [to please git](https://stackoverflow.com/questions/5813311/no-newline-at-end-of-file#5813359) and also [to keep up with POSIX standard](https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline).


# Discussion

- Facebook group: https://www.facebook.com/groups/thainlp
- GitHub issues: https://github.com/wannaphongcom/pythainlp/issues
- GitHub issues: https://github.com/PyThaiNLP/pythainlp/issues

Happy hacking! (;

Expand All @@ -54,14 +55,14 @@ Happy hacking! (;
## newmm (onecut), mm, TCC, and Thai Soundex Code
- Korakot Chaovavanich

## Thai2Vec & ulmfit
## thai2fit & ULMFiT
- Charin Polpanumas

## Docs
- Peeradej Tanruangporn

## Contributors
- See more contributions here https://github.com/wannaphongcom/pythainlp/graphs/contributors
- See more contributions here https://github.com/PyThaiNLP/pythainlp/graphs/contributors


# References
Expand Down
54 changes: 41 additions & 13 deletions README-pypi.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
![PyThaiNLP Logo](https://avatars0.githubusercontent.com/u/32934255?s=200&v=4)

# PyThaiNLP 1.7
# PyThaiNLP 2.0

[![Codacy Badge](https://api.codacy.com/project/badge/Grade/cb946260c87a4cc5905ca608704406f7)](https://www.codacy.com/app/pythainlp/pythainlp_2?utm_source=github.com&utm_medium=referral&utm_content=PyThaiNLP/pythainlp&utm_campaign=Badge_Grade)[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)
[![Build Status](https://travis-ci.org/PyThaiNLP/pythainlp.svg?branch=develop)](https://travis-ci.org/PyThaiNLP/pythainlp)
Expand All @@ -10,32 +10,60 @@

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP features include Thai word and subword segmentations, soundex, romanization, part-of-speech taggers, and spelling corrections.
PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

## What's new in version 1.7 ?
📖 For details on upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see [From PyThaiNLP 1.7 to PyThaiNLP 2.0](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)

- Deprecate Python 2 support
- Refactor pythainlp.tokenize.pyicu for readability
- Add Thai NER model to pythainlp.ner
- thai2vec v0.2 - larger vocab, benchmarking results on Wongnai dataset
- Sentiment classifier based on ULMFit and various product review datasets
- Add ULMFit utility to PyThaiNLP
- Add Thai romanization model thai2rom
- Retrain POS-tagging model
📖 For ThaiNER user after upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see [Upgrade ThaiNER from PyThaiNLP 1.7 to PyThaiNLP 2.0](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)

📫 follow us on Facebook [Pythainlp](https://www.facebook.com/pythainlp/)

## What's new in version 2.0 ?

- New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
- Terminate Python 2 support. Remove all Python 2 compatibility code.
- Remove old, obsolated, deprecated, and experimental code.
- Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
- ThaiNER 1.0
- Remove sentiment analysis
- Improved word_tokenize (newmm, mm) and dict_word_tokenize
- Documentation added
- Improved POS-tagging
- More and improved examples
- see [PyThaiNLP 2.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/118)

## Install

For stable version:

```sh
pip install pythainlp
```

For some advanced functionalities, like word vector, extra packages may be needed. Install them with these options during pip install:

```
pip install pythainlp[extra1,extra2,...]
```

where extras can be

- `artagger` (to support artagger part-of-speech tagger)*
- `deepcut` (to support deepcut machine-learnt tokenizer)
- `icu` (for ICU support in transliteration and tokenization)
- `ipa` (for International Phonetic Alphabet support in transliteration)
- `ml` (to support fastai 1.0.22 ULMFiT models)
- `ner` (for named-entity recognizer)
- `thai2fit` (for Thai word vector)
- `thai2rom` (for machine-learnt romanization)
- `full` (install everything)

**Note for Windows**: `marisa-trie` wheels can be obtained from https://www.lfd.uci.edu/~gohlke/pythonlibs/#marisa-trie
Install it with pip, for example: `pip install marisa_trie‑0.7.5‑cp36‑cp36m‑win32.whl`

## Links

- Docs: https://thainlp.org/pythainlp/docs/1.7/
- User guide : [English](https://colab.research.google.com/drive/1MQ10D1mJC5r1vQAHcj4ShoRS14vz8ZF-) , [ภาษาไทย](https://colab.research.google.com/drive/1rEkB2Dcr1UAKPqz4bCghZV7pXx2qxf89)
- Docs: https://thainlp.org/pythainlp/docs/2.0/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
- Facebook : [Pythainlp](https://www.facebook.com/pythainlp/)
Loading