Skip to content

feat: Add option to pass QAICInferenceSession to TextGeneration #356

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

quic-shagun
Copy link
Contributor

Every time cloud_ai_100_exec_kv is called, a QAICInferenceSession gets created and the model is loaded. This makes the entire execution very slow from an application point of view.

This PR gives the user the option to create a session once and send it to the TextGeneration class to be reused for every call.

Signed-off-by: quic-shagun <quic_shagsood@quicinc.com>
@quic-amitraj quic-amitraj marked this pull request as draft April 11, 2025 12:19
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant