Replies: 4 comments 6 replies
-
Looking at the results from https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/ I am not sure that these models are any better than GPT-J |
Beta Was this translation helpful? Give feedback.
0 replies
-
Can someone link the python inference code if it is available - I cannot seem to find it |
Beta Was this translation helpful? Give feedback.
6 replies
-
New 3B model looks quite capable https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/ |
Beta Was this translation helpful? Give feedback.
0 replies
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
-
The announcement is here:
https://www.cerebras.net/press-release/cerebras-systems-releases-seven-new-gpt-models-trained-on-cs-2-wafer-scale-systems
The models are available here:
https://huggingface.co/cerebras
Excerpts:
"SUNNYVALE, CALIFORNIA – March 28, 2023 – Cerebras Systems, the pioneer in artificial intelligence (AI) compute for generative AI, today announced it has trained and is releasing a series of seven GPT-based large language models (LLMs) for open use by the research community. This is the first time a company has used non-GPU based AI systems to train LLMs up to 13 billion parameters and is sharing the models, weights, and training recipe via the industry standard Apache 2.0 license. All seven models were trained on the 16 CS-2 systems in the Cerebras Andromeda AI supercomputer."
"Cerebras’ release today directly addresses these issues. In a first among AI hardware companies, Cerebras researchers trained, on the Andromeda AI supercomputer, a series of seven GPT models with 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B parameters."
Beta Was this translation helpful? Give feedback.
All reactions