Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Feature] Speculative Inference #640

Merged
merged 4 commits into from
Sep 11, 2023
Merged

Conversation

wheresmyhair
Copy link
Collaborator

@wheresmyhair wheresmyhair commented Sep 11, 2023

Speculative inference is now ready for users to try via:
python ./examples/speculative_inference.py --model gpt2-xl --draft_model gpt2 --temperature 0.5 --gpu 0 --gamma 5 --max_new_tokens 512

Model names could be huggingface model name or local cached hf decoder models.
When temperature <= 1e-6, will use argmax sampling.
gpu refers to gpu id. Currently speculative only supports single gpu inference.

@wheresmyhair wheresmyhair changed the title [Feature] Speculative [Feature] Speculative Inference Sep 11, 2023
Copy link
Contributor

@research4pan research4pan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, now speculative decoding is available for playing!

@research4pan research4pan merged commit 4d124d6 into OptimalScale:main Sep 11, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants