non-stream is not supported

i use ipex-llm==2.1.0b20240805+vllm 0.4.2 to run Qwen2-7B-Instruct on CPU, the use curl to launch http request to call the api which is openai api compatible.
The server start command:
```
python -m ipex_llm.vllm.cpu.entrypoints.openai.api_server  
--model /datamnt/Qwen2-7B-Instruct --port 8080   
--served-model-name 'Qwen/Qwen2-7B-Instruct'  
--load-format 'auto' --device cpu --dtype bfloat16  
--load-in-low-bit sym_int4   
--max-num-batched-tokens 32768
```
The curl command:
```
time curl http://172.16.30.28:8080/v1/chat/completions  -H "Content-Type: application/json" -d '{
    "model": "Qwen/Qwen2-7B-Instruct",
    "messages": [
        {"role": "system", "content": "你是一个写作助手"},
        {"role": "user", "content": "请帮忙写一篇描述江南春天的小作文"}
    ],
    "top_k": 1,
    "max_tokens": 256,
    "stream": false}'
```
Then the server raised error after the inference finished:
```
INFO 01-17 09:51:07 metrics.py:334] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 14.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%
INFO 01-17 09:51:09 async_llm_engine.py:120] Finished request cmpl-a6703cc7cb0140adaebbfdd9dbf1f1e5.
INFO:     172.16.30.28:47694 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
  File "/data/qingfu.zeng/vllm-0.4.2-venv/lib/python3.10/site-packages/ipex_llm/vllm/cpu/entrypoints/openai/api_server.py", line 117, in create_chat_completion
    invalidInputError(isinstance(generator, ChatCompletionResponse))
TypeError: invalidInputError() missing 1 required positional argument: 'errMsg'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

non-stream is not supported #12723

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

non-stream is not supported #12723

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions