-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Ideas for pythainlp.lm function #1048
Comments
If we're going to have a small language model as well, should we call the module just "lm"? |
Agree 👍 |
How about leveraging NVIDIA-Curator to do pre-processing and post-processing? We already have some examples from the NVIDIA team: |
Add pythainlp.lm.calculate_ngram_counts #1054 |
For the "small language model", what about having that model as a core/cores for most of the basic tasks that don't required larger model in PyThaiNLP? So we will have less dependencies as well. Related to |
Just |
I think
pythainlp.lm
class should collect the function for doing preprocessing or post-processing Thai text from LLM and include a small language model that can run in computers for home users to do simple NLP jobs.Preprocessing
pythainlp.lm.calculate_ngram_counts
: Calculates the counts of n-grams in the list words for the specified range. Add pythainlp.lm.calculate_ngram_counts #1054Post-processing
pythainlp.lm.remove_repeated_ngrams
: Remove repeated n-grams (to fixed lm) Add pythainlp.llm #1043The text was updated successfully, but these errors were encountered: