You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not fully familiar with this codebase, so pardon if I'm wrong. My first attempt to modify the code was to expand hardcoded context window of 512 to 4096 but additional memory usage was not pleasant.
New memory usage is reportedly ggml ctx size = 6065.34 MB and task manager agrees. That's 2GB down.
So far everything is working, no crashes and no degradation in quality. Is there any reason to not do that?
The text was updated successfully, but these errors were encountered:
I'm not fully familiar with this codebase, so pardon if I'm wrong. My first attempt to modify the code was to expand hardcoded context window of 512 to 4096 but additional memory usage was not pleasant.
LLAMA 7B quantized to 4 bits reports
ggml ctx size = 8113.34 MB
I went to the code and changed data type for
memory_k
andmemory_v
fromGGML_TYPE_F32
toGGML_TYPE_F16
These are the changed lines:
And these:
New memory usage is reportedly
ggml ctx size = 6065.34 MB
and task manager agrees. That's 2GB down.So far everything is working, no crashes and no degradation in quality. Is there any reason to not do that?
The text was updated successfully, but these errors were encountered: