You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm assuming you mean how much memory it uses while you're actually running the model?
That depends on stuff like the type of quantization and the context size you set. You can assume it will require memory equal to the .gguf file plus a bit extra for general stuff that's always needed and then some variable amount depending on the context size you set.
This also depends on the type of model. LLaMAv1 models, for example, use a lot more memory for the context than LLaMAv2 models. Other models like Starcoders, Baichuan, etc may vary also.
You can expect a Q4_K quantized 7B LLaMA model to require around 4GB RAM just to load and then maybe another 1-2GB based on the context size. This is just a very inexact ballpark figure to give you an idea of the general range.
I am thinking about building an app in ios. That will use lamma model so what will be the size of the model when it runs on the phone ?
The text was updated successfully, but these errors were encountered: