You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装
Version info / 版本信息
xinference, version 1.2.1
The command used to start Xinference / 用以启动 xinference 的命令
xinference-supervisor -H localhost --log-level info
xinference-worker -e http://localhost:9997 --log-level info
Reproduction / 复现过程
推理引擎运行时报错,运行模型:deepseek-r1-distill-qwen
worker报错内容:
2025-02-08 09:18:47,301 xinference.model.llm.transformers.utils 23334 ERROR Internal error for batch inference: 'tuple' object has no attribute 'get_seq_leng
th'.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 491, in batch_inference_one_step
_batch_inference_one_step_internal(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 318, in _batch_inference_one_step_internal
out = model(**inf_kws, use_cache=True, past_key_values=past_key_values)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 816, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 536, in forward
past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
AttributeError: 'tuple' object has no attribute 'get_seq_length'
Destroy generator a4590297e5ba11ef97c1801844f41360 due to an error encountered.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 419, in __xoscar_next__
r = await asyncio.create_task(_async_wrapper(gen))
File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 409, in _async_wrapper
return await _gen.__anext__() # noqa: F821
File "/usr/local/lib/python3.10/site-packages/xinference/core/model.py", line 548, in _to_async_gen
async for v in gen:
File "/usr/local/lib/python3.10/site-packages/xinference/core/model.py", line 741, in _queue_consumer
raise RuntimeError(res[len(XINFERENCE_STREAMING_ERROR_FLAG) :])
RuntimeError: 'tuple' object has no attribute 'get_seq_length'
Destroy generator 9d05f426e5ba11efa26b801844f41360 due to an error encountered.
supervisor报错内容:
2025-02-08 09:18:47,416 xinference.api.restful_api 28000 ERROR Chat completion stream got an error: [address=0.0.0.0:45837, pid=23334] 'tuple' object has no
attribute 'get_seq_length'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/xinference/api/restful_api.py", line 2050, in stream_results
async for item in iterator:
File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
return await self._actor_ref.__xoscar_next__(self._uid)
File "/usr/local/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
File "/usr/local/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/usr/local/lib/python3.10/site-packages/xoscar/backends/pool.py", line 667, in send
result = await self._run_coro(message.message_id, coro)
File "/usr/local/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 431, in __xoscar_next__
raise e
File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 419, in __xoscar_next__
r = await asyncio.create_task(_async_wrapper(gen))
File "/usr/local/lib/python3.10/site-packages/xoscar/api.py", line 409, in _async_wrapper
return await _gen.__anext__() # noqa: F821
File "/usr/local/lib/python3.10/site-packages/xinference/core/model.py", line 548, in _to_async_gen
async for v in gen:
File "/usr/local/lib/python3.10/site-packages/xinference/core/model.py", line 741, in _queue_consumer
raise RuntimeError(res[len(XINFERENCE_STREAMING_ERROR_FLAG) :])
RuntimeError: [address=0.0.0.0:45837, pid=23334] 'tuple' object has no attribute 'get_seq_length'
Expected behavior / 期待表现
异常原因及修复。
The text was updated successfully, but these errors were encountered:
System Info / 系統信息
Python 3.10.12
NVIDIA-SMI 535.129.03
Driver Version: 535.129.03
CUDA Version: 12.2
vllm: 0.4.0.post1
transformers: 4.48.2
system: 4.15.0-213-generic #224-Ubuntu
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
xinference, version 1.2.1
The command used to start Xinference / 用以启动 xinference 的命令
xinference-supervisor -H localhost --log-level info
xinference-worker -e http://localhost:9997 --log-level info
Reproduction / 复现过程
推理引擎运行时报错,运行模型:deepseek-r1-distill-qwen
worker报错内容:
supervisor报错内容:
Expected behavior / 期待表现
异常原因及修复。
The text was updated successfully, but these errors were encountered: