[WiP] Fixing kv cache injection for LlaMa and Mistral #2244

dbogunowicz · 2024-04-16T13:24:23Z

No description provided.

dbogunowicz · 2024-04-22T10:48:42Z

@abhinavnmagic can I have reviews and testing?

abhinavnmagic · 2024-04-22T23:39:44Z

Does this PR fix ONNX export for quantized or just pruned or both? I will test accordingly.

dbogunowicz · 2024-04-23T11:33:35Z

@abhinavnmagic for all the llama models, both quant and non-quant

dbogunowicz and others added 2 commits April 16, 2024 13:23

i think i fixed llama

67895be

Merge branch 'main' into feature/damian/fixing_injection

c40ca1e

Merge branch 'main' into feature/damian/fixing_injection

5c79a5e

Merge branch 'main' into feature/damian/fixing_injection

37b7a96

Provide feedback