Skip to content

Commit

Permalink
Re-arch on tutorials/quickstart/installation
Browse files Browse the repository at this point in the history
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
  • Loading branch information
Yikun committed Mar 9, 2025
1 parent 91f7d81 commit d6d1fea
Show file tree
Hide file tree
Showing 11 changed files with 412 additions and 357 deletions.
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@
# This value should be updated when cut down release.
'pip_vllm_ascend_version': "0.7.3rc1",
'pip_vllm_version': "0.7.3",
# CANN image tag
'cann_image_tag': "8.0.0-910b-ubuntu22.04-py3.10",
}

# Add any paths that contain templates here, relative to this directory.
Expand Down
10 changes: 5 additions & 5 deletions docs/source/developer_guide/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,21 +53,21 @@ locally. The simplest way to run these integration tests locally is through a co
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend

IMAGE=vllm-ascend-dev-image
CONTAINER_NAME=vllm-ascend-dev
DEVICE=/dev/davinci1
export IMAGE=vllm-ascend-dev-image
export CONTAINER_NAME=vllm-ascend-dev
export DEVICE=/dev/davinci1

# The first build will take about 10 mins (10MB/s) to download the base image and packages
docker build -t $IMAGE -f ./Dockerfile .
# You can also specify the mirror repo via setting VLLM_REPO to speedup
# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm

docker run --name $CONTAINER_NAME --network host --device $DEVICE \
docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \
--device /dev/davinci_manager --device /dev/devmm_svm \
--device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-ti --rm $IMAGE bash
-ti $IMAGE bash

cd vllm-ascend
pip install -r requirements-dev.txt
Expand Down
10 changes: 5 additions & 5 deletions docs/source/developer_guide/contributing.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,21 +48,21 @@ git commit -sm "your commit info"
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend

IMAGE=vllm-ascend-dev-image
CONTAINER_NAME=vllm-ascend-dev
DEVICE=/dev/davinci1
export IMAGE=vllm-ascend-dev-image
export CONTAINER_NAME=vllm-ascend-dev
export DEVICE=/dev/davinci1

# 首次构建会花费10分钟(10MB/s)下载基础镜像和包
docker build -t $IMAGE -f ./Dockerfile .
# 您还可以通过设置 VLLM_REPO 来指定镜像仓库以加速
# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm

docker run --name $CONTAINER_NAME --network host --device $DEVICE \
docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \
--device /dev/davinci_manager --device /dev/devmm_svm \
--device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-ti --rm $IMAGE bash
-ti $IMAGE bash

cd vllm-ascend
pip install -r requirements-dev.txt
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ By using vLLM Ascend plugin, popular open-source models, including Transformer-l
:maxdepth: 1
quick_start
installation
tutorials
tutorials/index.md
faqs
:::

Expand Down
41 changes: 25 additions & 16 deletions docs/source/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,12 @@ Refer to [Ascend Environment Setup Guide](https://ascend.github.io/docs/sources/

The easiest way to prepare your software environment is using CANN image directly:

```bash
```{code-block} bash
:substitutions:
# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci7

# Update the vllm-ascend image
export IMAGE=quay.io/ascend/cann:|cann_image_tag|
docker run --rm \
--name vllm-ascend-env \
--device $DEVICE \
Expand All @@ -59,14 +61,16 @@ docker run --rm \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-it quay.io/ascend/cann:8.0.0-910b-ubuntu22.04-py3.10 bash
-it $IMAGE bash
```

:::{dropdown} Click here to see "Install CANN manally"
:animate: fade-in-slide-down
You can also install CANN manually:

:::{note}
```{note}
This guide takes aarch64 as an example. If you run on x86, you need to replace `aarch64` with `x86_64` for the package name shown below.
:::
```

```bash
# Create a virtual environment
Expand Down Expand Up @@ -94,6 +98,8 @@ chmod +x. /Ascend-cann-nnal_8.0.0_linux-aarch64.run
source /usr/local/Ascend/nnal/atb/set_env.sh
```

:::

::::

::::{tab-item} Before using docker
Expand Down Expand Up @@ -125,6 +131,7 @@ pip install vllm==|pip_vllm_version|
pip install vllm-ascend==|pip_vllm_ascend_version| --extra-index https://download.pytorch.org/whl/cpu/
```

:::{dropdown} Click here to see "Build from source code"
or build from **source code**:

```{code-block} bash
Expand All @@ -140,6 +147,7 @@ git clone --depth 1 --branch |vllm_ascend_version| https://github.com/vllm-proj
cd vllm-ascend
pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
```
:::

Current version depends on a unreleased `torch-npu`, you need to install manually:

Expand Down Expand Up @@ -167,14 +175,23 @@ pip install ./torch_npu-2.5.1.dev20250307-cp310-cp310-manylinux_2_17_aarch64.man

You can just pull the **prebuilt image** and run it with bash.

:::{dropdown} Click here to see "Build from Dockerfile"
or build IMAGE from **source code**:

```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
docker build -t vllm-ascend-dev-image:latest -f ./Dockerfile .
```
:::

```{code-block} bash
:substitutions:
# Update DEVICE according to your device (/dev/davinci[0-7])
DEVICE=/dev/davinci7
export DEVICE=/dev/davinci7
# Update the vllm-ascend image
IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
docker pull $IMAGE
export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
docker run --rm \
--name vllm-ascend-env \
--device $DEVICE \
Expand All @@ -189,14 +206,6 @@ docker run --rm \
-it $IMAGE bash
```

or build IMAGE from **source code**:

```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
docker build -t vllm-ascend-dev-image:latest -f ./Dockerfile .
```

::::

:::::
Expand Down
31 changes: 18 additions & 13 deletions docs/source/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,13 @@
```{code-block} bash
:substitutions:
# You can change version a suitable one base on your requirement, e.g. main
# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci0
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
docker run \
docker run --rm \
--name vllm-ascend \
--device /dev/davinci0 \
--device $DEVICE \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
Expand All @@ -32,17 +33,19 @@ docker run \

## Usage

There are two ways to start vLLM on Ascend NPU:

### Offline Batched Inference with vLLM

With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing).
You can use Modelscope mirror to speed up download:

```bash
# Use Modelscope mirror to speed up download
export VLLM_USE_MODELSCOPE=true
```

There are two ways to start vLLM on Ascend NPU:

:::::{tab-set}
::::{tab-item} Offline Batched Inference

With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing).

Try to run below Python script directly or use `python3` shell to generate texts:

```python
Expand All @@ -64,15 +67,15 @@ for output in outputs:
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```

### OpenAI Completions API with vLLM
::::

::::{tab-item} OpenAI Completions API

vLLM can also be deployed as a server that implements the OpenAI API protocol. Run
the following command to start the vLLM server with the
[Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model:

```bash
# Use Modelscope mirror to speed up download
export VLLM_USE_MODELSCOPE=true
# Deploy vLLM server (The first run will take about 3-5 mins (10 MB/s) to download models)
vllm serve Qwen/Qwen2.5-0.5B-Instruct &
```
Expand Down Expand Up @@ -124,3 +127,5 @@ INFO: Application shutdown complete.
```

Finally, you can exit container by using `ctrl-D`.
::::
:::::
Loading

0 comments on commit d6d1fea

Please # to comment.