-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Integrate vLLM Evaluator #23
Labels
enhancement 🚀
New feature or request
Comments
Initial explorationSeems like vLLM can run inside a Ray cluster just fine. Basic working code example
Usage:
|
Notes on initial exploration:
I think this approach is good enough to use. vLLMEvaluator can be a simple usage of vllm, but will need some adapters for sampling and returning logprobs. |
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
vLLM is a high-throughput LLM evaluator which runs on HuggingFace models, performing various kinds of model sharding across GPUs using Ray backend.
In its basic form, vLLM is a great speedup over AccelerateEvaluator, which is quite slow.
Basic requirements:
The text was updated successfully, but these errors were encountered: