运行deepseek-r1-distill-qwen出现get_seq_length异常 #2818

superhuahua · 2025-02-08T03:01:21Z

System Info / 系統信息

Python 3.10.12
NVIDIA-SMI 535.129.03
Driver Version: 535.129.03
CUDA Version: 12.2
vllm: 0.4.0.post1
transformers: 4.48.2
system: 4.15.0-213-generic #224-Ubuntu

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

xinference, version 1.2.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference-supervisor -H localhost --log-level info
xinference-worker -e http://localhost:9997 --log-level info

Reproduction / 复现过程

推理引擎运行时报错，运行模型：deepseek-r1-distill-qwen

worker报错内容：

2025-02-08 09:18:47,301 xinference.model.llm.transformers.utils 23334 ERROR    Internal error for batch inference: 'tuple' object has no attribute 'get_seq_leng
th'.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 491, in batch_inference_one_step
    _batch_inference_one_step_internal(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 318, in _batch_inference_one_step_internal
    out = model(**inf_kws, use_cache=True, past_key_values=past_key_values)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 816, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 536, in forward
    past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
AttributeError: 'tuple' object has no attribute 'get_seq_length'
Destroy generator a4590297e5ba11ef97c1801844f41360 due to an error encountered.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 419, in __xoscar_next__
    r = await asyncio.create_task(_async_wrapper(gen))
  File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 409, in _async_wrapper
    return await _gen.__anext__()  # noqa: F821
  File "/usr/local/lib/python3.10/site-packages/xinference/core/model.py", line 548, in _to_async_gen
    async for v in gen:
  File "/usr/local/lib/python3.10/site-packages/xinference/core/model.py", line 741, in _queue_consumer
    raise RuntimeError(res[len(XINFERENCE_STREAMING_ERROR_FLAG) :])
RuntimeError: 'tuple' object has no attribute 'get_seq_length'
Destroy generator 9d05f426e5ba11efa26b801844f41360 due to an error encountered.

supervisor报错内容：

2025-02-08 09:18:47,416 xinference.api.restful_api 28000 ERROR    Chat completion stream got an error: [address=0.0.0.0:45837, pid=23334] 'tuple' object has no
attribute 'get_seq_length'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/xinference/api/restful_api.py", line 2050, in stream_results
    async for item in iterator:
  File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
    return await self._actor_ref.__xoscar_next__(self._uid)
  File "/usr/local/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/site-packages/xoscar/backends/pool.py", line 667, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 431, in __xoscar_next__
    raise e
  File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 419, in __xoscar_next__
    r = await asyncio.create_task(_async_wrapper(gen))
  File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 409, in _async_wrapper
    return await _gen.__anext__()  # noqa: F821
  File "/usr/local/lib/python3.10/site-packages/xinference/core/model.py", line 548, in _to_async_gen
    async for v in gen:
  File "/usr/local/lib/python3.10/site-packages/xinference/core/model.py", line 741, in _queue_consumer
    raise RuntimeError(res[len(XINFERENCE_STREAMING_ERROR_FLAG) :])
RuntimeError: [address=0.0.0.0:45837, pid=23334] 'tuple' object has no attribute 'get_seq_length'

Expected behavior / 期待表现

异常原因及修复。

The text was updated successfully, but these errors were encountered:

tianyuchan · 2025-02-08T03:19:44Z

降低 transformers 版本：transformers==4.38.2
参考：deepseek-ai/Janus#99

XprobeBot added the gpu label Feb 8, 2025

XprobeBot added this to the v1.x milestone Feb 8, 2025

ChengjieLi28 mentioned this issue Feb 8, 2025

BUG: Use Cache class instead of raw tuple for transformers continuous batching, compatible with latest transformers #2820

Merged

ChengjieLi28 closed this as completed in #2820 Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行deepseek-r1-distill-qwen出现get_seq_length异常 #2818

运行deepseek-r1-distill-qwen出现get_seq_length异常 #2818

superhuahua commented Feb 8, 2025 •

edited by qinxuye

Loading

tianyuchan commented Feb 8, 2025

运行deepseek-r1-distill-qwen出现get_seq_length异常 #2818

运行deepseek-r1-distill-qwen出现get_seq_length异常 #2818

Comments

superhuahua commented Feb 8, 2025 • edited by qinxuye Loading

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

tianyuchan commented Feb 8, 2025

superhuahua commented Feb 8, 2025 •

edited by qinxuye

Loading