Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug]: assert num_new_tokens > 0 crashes entire worker instead of just failing single API call #7632

Closed
pseudotensor opened this issue Aug 18, 2024 · 1 comment · Fixed by #7746
Labels
bug Something isn't working

Comments

@pseudotensor
Copy link

pseudotensor commented Aug 18, 2024

Your current environment

vllm docker 0.5.4

docker pull vllm/vllm-openai:latest
docker stop danube3_mig ; docker remove danube3_mig
docker run -d --restart=always \
    --runtime=nvidia \
    --gpus '"device=MIG-a6dbed35-9d05-58da-a0b5-23ae5bf8427e"' \
    --shm-size=10.24gb \
    -p 5004:5004 \
    -e NCCL_IGNORE_DISABLED_P2P=1 \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    --name danube3_mig \
    vllm/vllm-openai:latest \
        --port=5004 \
        --host=0.0.0.0 \
        --model=h2oai/h2o-danube3-4b-chat \
        --seed 1234 \
        --trust-remote-code \
        --tensor-parallel-size=1 \
        --max-model-len=8192 \
        --gpu-memory-utilization=0.99 \
        --max-num-batched-tokens=131072 --max-log-len=100 \
        --use-v2-block-manager \
        --num-speculative-tokens=5 \
        --ngram-prompt-lookup-max=4 \
        --enable-prefix-caching \
        --speculative-model="[ngram]" \
        --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.danube3_migb.txt

Unsure if has to do with speculative, seems just like prompt='' causes it.

🐛 Describe the bug

INFO:     172.16.0.199:21756 - "GET /health HTTP/1.1" 200 OK
INFO 08-18 00:51:02 logger.py:36] Received request cmpl-14b87b97d9a8481d8963a0a1652b217b-0: prompt: '', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.3, top_p=1.>
INFO:     172.16.0.199:21766 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 08-18 00:51:02 async_llm_engine.py:174] Added request cmpl-14b87b97d9a8481d8963a0a1652b217b-0.
ERROR 08-18 00:51:02 async_llm_engine.py:57] Engine background task failed
ERROR 08-18 00:51:02 async_llm_engine.py:57] Traceback (most recent call last):
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 47, in _log_task_completion
ERROR 08-18 00:51:02 async_llm_engine.py:57]     return_value = task.result()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 642, in run_engine_loop
ERROR 08-18 00:51:02 async_llm_engine.py:57]     result = task.result()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 585, in engine_step
ERROR 08-18 00:51:02 async_llm_engine.py:57]     request_outputs = await self.engine.step_async(virtual_engine)
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 239, in step_async
ERROR 08-18 00:51:02 async_llm_engine.py:57]     virtual_engine].schedule()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 950, in schedule
ERROR 08-18 00:51:02 async_llm_engine.py:57]     scheduler_outputs = self._schedule()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 925, in _schedule
ERROR 08-18 00:51:02 async_llm_engine.py:57]     return self._schedule_default()
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 785, in _schedule_default
ERROR 08-18 00:51:02 async_llm_engine.py:57]     prefills = self._schedule_prefills(budget,
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 683, in _schedule_prefills
ERROR 08-18 00:51:02 async_llm_engine.py:57]     num_new_tokens = self._get_num_new_tokens(seq_group,
ERROR 08-18 00:51:02 async_llm_engine.py:57]   File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 1206, in _get_num_new_tokens
ERROR 08-18 00:51:02 async_llm_engine.py:57]     assert num_new_tokens > 0
ERROR 08-18 00:51:02 async_llm_engine.py:57] AssertionError
Exception in callback _log_task_completion(error_callback=<bound method...72047c646d70>>)(<Task finishe...ertionError()>) at /usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py:37
handle: <Handle _log_task_completion(error_callback=<bound method...72047c646d70>>)(<Task finishe...ertionError()>) at /usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py:37>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 47, in _log_task_completion
    return_value = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 642, in run_engine_loop
    result = task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 585, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 239, in step_async
    virtual_engine].schedule()
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 950, in schedule
    scheduler_outputs = self._schedule()
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 925, in _schedule
    return self._schedule_default()
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 785, in _schedule_default
    prefills = self._schedule_prefills(budget,
INFO 08-18 00:51:02 async_llm_engine.py:181] Aborted request cmpl-14b87b97d9a8481d8963a0a1652b217b-0.
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 683, in _schedule_prefills
    num_new_tokens = self._get_num_new_tokens(seq_group,
  File "/usr/local/lib/python3.10/dist-packages/vllm/core/scheduler.py", line 1206, in _get_num_new_tokens
    assert num_new_tokens > 0
AssertionError


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 59, in _log_task_completion
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap
    await func()
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect
    message = await receive()
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 54, in wrapped_receive
    msg = await self.receive()
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 553, in receive
    await self.message_event.wait()
  File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 720537665120

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 192, in __call__
    await response(scope, wrapped_receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 258, in __call__
    async with anyio.create_task_group() as task_group:
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
    raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/_utils.py", line 87, in collapse_excgroups
    yield
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/base.py", line 190, in __call__
    async with anyio.create_task_group() as task_group:
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
    raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

@pseudotensor pseudotensor added the bug Something isn't working label Aug 18, 2024
@pseudotensor
Copy link
Author

To be clear, the bug is at least that the entire vllm engine is taken down by prompt=''

@njhill njhill mentioned this issue Aug 21, 2024
Anyonering added a commit to iidsample/chatdatagen that referenced this issue Oct 2, 2024
Tested in local machines. Have assertiona failed on server side which
causes the vllm worker to crash. This may be caused by sending empty
prompt to the server as described in
vllm-project/vllm#7632 and
vllm-project/vllm#7746. Need to further
inspection on this later.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant