Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[WiP] Fixing kv cache injection for LlaMa and Mistral #2244

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dbogunowicz
Copy link
Contributor

No description provided.

@dbogunowicz
Copy link
Contributor Author

@abhinavnmagic can I have reviews and testing?

@abhinavnmagic
Copy link
Contributor

Does this PR fix ONNX export for quantized or just pruned or both? I will test accordingly.

@dbogunowicz
Copy link
Contributor Author

@abhinavnmagic for all the llama models, both quant and non-quant

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants