Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Ideas for pythainlp.lm function #1048

Open
wannaphong opened this issue Dec 27, 2024 · 6 comments
Open

Ideas for pythainlp.lm function #1048

wannaphong opened this issue Dec 27, 2024 · 6 comments
Labels
enhancement enhance functionalities

Comments

@wannaphong
Copy link
Member

wannaphong commented Dec 27, 2024

I think pythainlp.lm class should collect the function for doing preprocessing or post-processing Thai text from LLM and include a small language model that can run in computers for home users to do simple NLP jobs.

Preprocessing

Post-processing

@wannaphong wannaphong moved this to In progress in PyThaiNLP Dec 27, 2024
@bact
Copy link
Member

bact commented Dec 27, 2024

If we're going to have a small language model as well, should we call the module just "lm"?
Just to make it more generic.

@wannaphong
Copy link
Member Author

If we're going to have a small language model as well, should we call the module just "lm"? Just to make it more generic.

Agree 👍

@wannaphong wannaphong changed the title Ideas for pythainlp.llm function Ideas for pythainlp.lm function Dec 28, 2024
@bact bact added the enhancement enhance functionalities label Dec 30, 2024
@matichon-vultureprime
Copy link

How about leveraging NVIDIA-Curator to do pre-processing and post-processing?

We already have some examples from the NVIDIA team:

@wannaphong
Copy link
Member Author

Add pythainlp.lm.calculate_ngram_counts #1054

@bact
Copy link
Member

bact commented Jan 5, 2025

For the "small language model", what about having that model as a core/cores for most of the basic tasks that don't required larger model in PyThaiNLP? So we will have less dependencies as well.

Related to

@wannaphong
Copy link
Member Author

For the "small language model", what about having that model as a core/cores for most of the basic tasks that don't required larger model in PyThaiNLP? So we will have less dependencies as well.

Related to

* [Porting model to ONNX model #639](https://github.com/PyThaiNLP/pythainlp/issues/639)

* [Porting Thai2fit from fastai v1 to fastai v2 #716](https://github.com/PyThaiNLP/pythainlp/issues/716)

* [Remove all python-crfsuite models from PyThaiNLP #655](https://github.com/PyThaiNLP/pythainlp/issues/655)

* [Consider reduce dependencies #935](https://github.com/PyThaiNLP/pythainlp/issues/935)

Just llama-cpp-python or onnx model. I think it is ok.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement enhance functionalities
Projects
Status: In progress
Development

No branches or pull requests

3 participants