More memory efficient on top of Llama model!
- Cross Layer KV, https://arxiv.org/pdf/2405.12981
- Offload and partition Attention, derived from https://arxiv.org/abs/2402.05099
All models randomly initialized, zero training done.
Tested using TinyLlama 1.1B.