Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[New Model]: 采用 Out-of-Tree Model Integration 方式注册新模型在启用多卡 Ray 模式下的注册信息丢失的问题 #10238

Closed
llery opened this issue Nov 12, 2024 · 3 comments · Fixed by #10372
Labels
new model Requests to new models

Comments

@llery
Copy link

llery commented Nov 12, 2024

1、自定义模型的注册方式是Out-of-Tree Model Integration,即不改动vllm源码的方式下注册了新模型Qwen2GotForCausalLM
image
参考自:https://docs.vllm.ai/en/latest/models/adding_model.html

2、一机多GPU=2下启动vllm api_server使用时(vllm版本0.6.3.post1):
(1)--tensor-parallel-size 2 + 默认的 --distributed-executor-backend mp 模式服务顺利启动,新模型也一切工作正常
(2)但如果 --tensor-parallel-size 2 + --distributed-executor-backend ray 模式下server就会启动不了,重点报错信息如下:
...
(RayWorkerWrapper pid=52112) ERROR 11-12 08:45:40 worker_base.py:464] return ModelRegistry.resolve_model_cls(architectures)
(RayWorkerWrapper pid=52112) ERROR 11-12 08:45:40 worker_base.py:464] File "/home/vipuser/miniconda3/envs/gotv/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 369, in resolve_model_cls
(RayWorkerWrapper pid=52112) ERROR 11-12 08:45:40 worker_base.py:464] return self._raise_for_unsupported(architectures)
(RayWorkerWrapper pid=52112) ERROR 11-12 08:45:40 worker_base.py:464] File "/home/vipuser/miniconda3/envs/gotv/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 317, in _raise_for_unsupported
(RayWorkerWrapper pid=52112) ERROR 11-12 08:45:40 worker_base.py:464] raise ValueError(
(RayWorkerWrapper pid=52112) ERROR 11-12 08:45:40 worker_base.py:464] ValueError: Model architectures ['Qwen2GotForCausalLM'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'ArcticForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', ....]

貌似在Ray启动后进程应该切换了,导致已注册模型列表里又没有了Out-of-Tree方式添加的自定义模型了
请问这是一个BUG还是模型注册时哪里有疏漏的问题?有没有什么解决办法?(不改动vllm源码)

@llery llery added the new model Requests to new models label Nov 12, 2024
@DarkLight1337
Copy link
Member

cc @youkaichao I'm not familiar with Ray, do you know how to re-apply plugins for the Ray processes?

@youkaichao
Copy link
Member

to make oot work for ray with distributed inference, you need to follow https://github.com/vllm-project/vllm/tree/main/tests/plugins/vllm_add_dummy_model to write a plugin in your package.

just find it is not documented anywhere :(

@llery
Copy link
Author

llery commented Nov 12, 2024

to make oot work for ray with distributed inference, you need to follow https://github.com/vllm-project/vllm/tree/main/tests/plugins/vllm_add_dummy_model to write a plugin in your package.

just find it is not documented anywhere :(

非常非常感谢,用plugin来注册后问题已解决:)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
new model Requests to new models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants