[Bug]: assert num_new_tokens > 0 crashes entire worker instead of just failing single API call #7632

pseudotensor · 2024-08-18T03:47:31Z

Your current environment

vllm docker 0.5.4

docker pull vllm/vllm-openai:latest
docker stop danube3_mig ; docker remove danube3_mig
docker run -d --restart=always \
    --runtime=nvidia \
    --gpus '"device=MIG-a6dbed35-9d05-58da-a0b5-23ae5bf8427e"' \
    --shm-size=10.24gb \
    -p 5004:5004 \
    -e NCCL_IGNORE_DISABLED_P2P=1 \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    --name danube3_mig \
    vllm/vllm-openai:latest \
        --port=5004 \
        --host=0.0.0.0 \
        --model=h2oai/h2o-danube3-4b-chat \
        --seed 1234 \
        --trust-remote-code \
        --tensor-parallel-size=1 \
        --max-model-len=8192 \
        --gpu-memory-utilization=0.99 \
        --max-num-batched-tokens=131072 --max-log-len=100 \
        --use-v2-block-manager \
        --num-speculative-tokens=5 \
        --ngram-prompt-lookup-max=4 \
        --enable-prefix-caching \
        --speculative-model="[ngram]" \
        --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.danube3_migb.txt

Unsure if has to do with speculative, seems just like prompt='' causes it.

🐛 Describe the bug

INFO:     172.16.0.199:21756 - "GET /health HTTP/1.1" 200 OK
INFO 08-18 00:51:02 logger.py:36] Received request cmpl-14b87b97d9a8481d8963a0a1652b217b-0: prompt: '', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.3, top_p=1.>
INFO:     172.16.0.199:21766 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 08-18 00:51:02 async_llm_engine.py:174] Added request cmpl-14b87b97d9a8481d8963a0a1652b217b-0.
ERROR 08-18 00:51:02 async_llm_engine.py:57] Engine background task failed
ERROR 08-18 00:51:02 async_llm_engine.py:57] Traceback (most recent call last):
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 47, in _log_task_completion
ERROR 08-18 00:51:02 async_llm_engine.py:57]     return_value = task.result()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 642, in run_engine_loop
ERROR 08-18 00:51:02 async_llm_engine.py:57]     result = task.result()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 585, in engine_step
ERROR 08-18 00:51:02 async_llm_engine.py:57]     request_outputs = await self.engine.step_async(virtual_engine)
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 239, in step_async
ERROR 08-18 00:51:02 async_llm_engine.py:57]     virtual_engine].schedule()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 950, in schedule
ERROR 08-18 00:51:02 async_llm_engine.py:57]     scheduler_outputs = self._schedule()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 925, in _schedule
ERROR 08-18 00:51:02 async_llm_engine.py:57]     return self._schedule_default()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 785, in _schedule_default
ERROR 08-18 00:51:02 async_llm_engine.py:57]     prefills = self._schedule_prefills(budget,
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 683, in _schedule_prefills
ERROR 08-18 00:51:02 async_llm_engine.py:57]     num_new_tokens = self._get_num_new_tokens(seq_group,
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 1206, in _get_num_new_tokens
ERROR 08-18 00:51:02 async_llm_engine.py:57]     assert num_new_tokens > 0
ERROR 08-18 00:51:02 async_llm_engine.py:57] AssertionError
Exception in callback _log_task_completion(error_callback=<bound method...72047c646d70>>)(<Task finishe...ertionError()>) at /usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py:37
handle: <Handle _log_task_completion(error_callback=<bound method...72047c646d70>>)(<Task finishe...ertionError()>) at /usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py:37>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 47, in _log_task_completion
    return_value = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 642, in run_engine_loop
    result = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 585, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 239, in step_async
    virtual_engine].schedule()
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 950, in schedule
    scheduler_outputs = self._schedule()
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 925, in _schedule
    return self._schedule_default()
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 785, in _schedule_default
    prefills = self._schedule_prefills(budget,
INFO 08-18 00:51:02 async_llm_engine.py:181] Aborted request cmpl-14b87b97d9a8481d8963a0a1652b217b-0.
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 683, in _schedule_prefills
    num_new_tokens = self._get_num_new_tokens(seq_group,
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 1206, in _get_num_new_tokens
    assert num_new_tokens > 0
AssertionError


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 59, in _log_task_completion
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap
    await func()
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect
    message = await receive()
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 54, in wrapped_receive
    msg = await self.receive()
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 553, in receive
    await self.message_event.wait()
  File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 720537665120

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 192, in __call__
    await response(scope, wrapped_receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 258, in __call__
    async with anyio.create_task_group() as task_group:
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
    raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/_utils.py", line 87, in collapse_excgroups
    yield
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 190, in __call__
    async with anyio.create_task_group() as task_group:
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
    raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

The text was updated successfully, but these errors were encountered:

pseudotensor · 2024-08-18T03:48:23Z

To be clear, the bug is at least that the entire vllm engine is taken down by prompt=''

Tested in local machines. Have assertiona failed on server side which causes the vllm worker to crash. This may be caused by sending empty prompt to the server as described in vllm-project/vllm#7632 and vllm-project/vllm#7746. Need to further inspection on this later.

pseudotensor added the bug Something isn't working label Aug 18, 2024

This was referenced Aug 18, 2024

[Bug]: Empty prompt kills vllm server (AsyncEngineDeadError: Background loop is stopped.) #7283

Closed

[Feature]: Exit on failures #7633

Closed

njhill mentioned this issue Aug 21, 2024

Release v0.5.5 #7481

Closed

maxdebayser mentioned this issue Aug 21, 2024

[BugFix] Fix server crash on empty prompt #7746

Merged

njhill closed this as completed in #7746 Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: assert num_new_tokens > 0 crashes entire worker instead of just failing single API call #7632

[Bug]: assert num_new_tokens > 0 crashes entire worker instead of just failing single API call #7632

pseudotensor commented Aug 18, 2024 •

edited

Loading

pseudotensor commented Aug 18, 2024

[Bug]: assert num_new_tokens > 0 crashes entire worker instead of just failing single API call #7632

[Bug]: assert num_new_tokens > 0 crashes entire worker instead of just failing single API call #7632

Comments

pseudotensor commented Aug 18, 2024 • edited Loading

Your current environment

🐛 Describe the bug

pseudotensor commented Aug 18, 2024

pseudotensor commented Aug 18, 2024 •

edited

Loading