NVIDIA GPT 2B model 30x smaller than LLAMA using almost same data, free for commercial use ! #1280
batmanonline
started this conversation in
Ideas
Replies: 1 comment
-
The benchmark results seem to be pretty good for a 2b, a bit better than Pythia2.8-deduped depending on the benchmark. |
Beta Was this translation helpful? Give feedback.
0 replies
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
-
The model was trained on 1.1T tokens obtained from publicly available data sources.
The dataset comprises 53 languages and code.
Remember LLAMA 65B is trained on 1.4T tokens so this model is 30 times smaller while being trained on approximately the same amount of data !
https://huggingface.co/nvidia/GPT-2B-001
License: CC-BY-4.0 can be used commercially
Beta Was this translation helpful? Give feedback.
All reactions