lmi cpu container with vLLM #2009

lanking520 · 2024-06-01T18:24:55Z

Description

Support CPU container build for vLLM based LLM inference. Tested with LLAMA3-7B, worked, but extremely slow

engine=Python
option.rolling_batch=vllm
option.model_id=NousResearch/Hermes-2-Pro-Llama-3-8B
option.tensor_parallel_degree=1

ydm-amazon · 2024-07-02T18:53:51Z

serving/docker/Dockerfile

+    VLLM_TARGET_DEVICE=cpu python3 setup.py bdist_wheel
+
+
+FROM base AS lmi-cpu


I thought that there could only be one FROM in each Dockerfile, I may be wrong but just want to check

lanking520 requested review from zachgk, frankfliu and a team as code owners June 1, 2024 18:24

add peft version

222c8d7

lanking520 force-pushed the cpu branch from dd931a2 to 222c8d7 Compare June 2, 2024 23:06

Qing Lan added 2 commits June 2, 2024 16:09

shift order

f537f63

revert back

f7d96cf

lanking520 changed the title ~~[WIP] lmi cpu container with vLLM~~ lmi cpu container with vLLM Jun 3, 2024

ydm-amazon reviewed Jul 2, 2024

View reviewed changes

		VLLM_TARGET_DEVICE=cpu python3 setup.py bdist_wheel


		FROM base AS lmi-cpu