NVIDIA GPT 2B model 30x smaller than LLAMA using almost same data, free for commercial use ! #1280

batmanonline · 2023-05-02T14:54:41Z

batmanonline
May 2, 2023

The model was trained on 1.1T tokens obtained from publicly available data sources.
The dataset comprises 53 languages and code.

Remember LLAMA 65B is trained on 1.4T tokens so this model is 30 times smaller while being trained on approximately the same amount of data !

License: CC-BY-4.0 can be used commercially

BetaDoggo · 2023-05-02T15:32:40Z

The benchmark results seem to be pretty good for a 2b, a bit better than Pythia2.8-deduped depending on the benchmark.

0 replies