Skip to content

mesolitica/more-memory-efficient

Repository files navigation

more-memory-efficient

More memory efficient on top of Llama model!

  1. Cross Layer KV, https://arxiv.org/pdf/2405.12981
  2. Offload and partition Attention, derived from https://arxiv.org/abs/2402.05099

Benchmark

Cross Layer KV

All models randomly initialized, zero training done.

Memory usage

alt text

Time taken

alt text

Offload and partition Attention

Tested using TinyLlama 1.1B.

alt text

About

Learning more memory efficient on top of Llama model!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published