SimpleTinyLlama

TinyLlama is a relatively small large language model with impressive capabilities for its size. The goal of this project is to serve as a simpler implementation of TinyLlama. The only required dependency is PyTorch.

Installation and usage

Install PyTorch.
Download and extract this repository.
Run main.py to chat with the llama.
Press CTRL + C to interrupt the response.
Press CTRL + C again to exit the program.

Example

Notes

CUDA will be used if available, but requires approximately 3 GB of VRAM. If you do not have that much VRAM, you can set the computation device manually in main.py.
Only inference is supported. Training is not supported.
Chat history is currently not supported.
This project includes a pure Python implementation of a subset of the Sentencepiece tokenizer. It is not as fast as the C++ implementation, but it is sufficient for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
tokenizer.py		tokenizer.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimpleTinyLlama

Installation and usage

Example

Notes

About

Releases

Packages

Languages

License

99991/SimpleTinyLlama

Folders and files

Latest commit

History

Repository files navigation

SimpleTinyLlama

Installation and usage

Example

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages