Open
Description
i use ipex-llm==2.1.0b20240805+vllm 0.4.2 to run Qwen2-7B-Instruct on CPU, the use curl to launch http request to call the api which is openai api compatible.
The server start command:
python -m ipex_llm.vllm.cpu.entrypoints.openai.api_server
--model /datamnt/Qwen2-7B-Instruct --port 8080
--served-model-name 'Qwen/Qwen2-7B-Instruct'
--load-format 'auto' --device cpu --dtype bfloat16
--load-in-low-bit sym_int4
--max-num-batched-tokens 32768
The curl command:
time curl http://172.16.30.28:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen2-7B-Instruct",
"messages": [
{"role": "system", "content": "你是一个写作助手"},
{"role": "user", "content": "请帮忙写一篇描述江南春天的小作文"}
],
"top_k": 1,
"max_tokens": 256,
"stream": false}'
Then the server raised error after the inference finished:
INFO 01-17 09:51:07 metrics.py:334] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 14.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%
INFO 01-17 09:51:09 async_llm_engine.py:120] Finished request cmpl-a6703cc7cb0140adaebbfdd9dbf1f1e5.
INFO: 172.16.30.28:47694 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/ipex_llm/vllm/cpu/entrypoints/openai/api_server.py", line 117, in create_chat_completion
invalidInputError(isinstance(generator, ChatCompletionResponse))
TypeError: invalidInputError() missing 1 required positional argument: 'errMsg'