Re-arch on tutorials/quickstart/installation

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
vllm-project · Mar 9, 2025 · d6d1fea · d6d1fea
1 parent 91f7d81
commit d6d1fea
Show file tree

Hide file tree

Showing 11 changed files with 412 additions and 357 deletions.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -72,6 +72,8 @@
     # This value should be updated when cut down release.
     'pip_vllm_ascend_version': "0.7.3rc1",
     'pip_vllm_version': "0.7.3",
+    # CANN image tag
+    'cann_image_tag': "8.0.0-910b-ubuntu22.04-py3.10",
 }
 
 # Add any paths that contain templates here, relative to this directory.

diff --git a/docs/source/developer_guide/contributing.md b/docs/source/developer_guide/contributing.md
@@ -53,21 +53,21 @@ locally. The simplest way to run these integration tests locally is through a co
 git clone https://github.com/vllm-project/vllm-ascend.git
 cd vllm-ascend
 
-IMAGE=vllm-ascend-dev-image
-CONTAINER_NAME=vllm-ascend-dev
-DEVICE=/dev/davinci1
+export IMAGE=vllm-ascend-dev-image
+export CONTAINER_NAME=vllm-ascend-dev
+export DEVICE=/dev/davinci1
 
 # The first build will take about 10 mins (10MB/s) to download the base image and packages
 docker build -t $IMAGE -f ./Dockerfile .
 # You can also specify the mirror repo via setting VLLM_REPO to speedup
 # docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm
 
-docker run --name $CONTAINER_NAME --network host --device $DEVICE \
+docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \
            --device /dev/davinci_manager --device /dev/devmm_svm \
            --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
            -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
            -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-           -ti --rm $IMAGE bash
+           -ti $IMAGE bash
 
 cd vllm-ascend
 pip install -r requirements-dev.txt

diff --git a/docs/source/developer_guide/contributing.zh.md b/docs/source/developer_guide/contributing.zh.md
@@ -48,21 +48,21 @@ git commit -sm "your commit info"
 git clone https://github.com/vllm-project/vllm-ascend.git
 cd vllm-ascend
 
-IMAGE=vllm-ascend-dev-image
-CONTAINER_NAME=vllm-ascend-dev
-DEVICE=/dev/davinci1
+export IMAGE=vllm-ascend-dev-image
+export CONTAINER_NAME=vllm-ascend-dev
+export DEVICE=/dev/davinci1
 
 # 首次构建会花费10分钟（10MB/s）下载基础镜像和包
 docker build -t $IMAGE -f ./Dockerfile .
 # 您还可以通过设置 VLLM_REPO 来指定镜像仓库以加速
 # docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm
 
-docker run --name $CONTAINER_NAME --network host --device $DEVICE \
+docker run --rm --name $CONTAINER_NAME --network host --device $DEVICE \
            --device /dev/davinci_manager --device /dev/devmm_svm \
            --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
            -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
            -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-           -ti --rm $IMAGE bash
+           -ti $IMAGE bash
 
 cd vllm-ascend
 pip install -r requirements-dev.txt

diff --git a/docs/source/index.md b/docs/source/index.md
@@ -35,7 +35,7 @@ By using vLLM Ascend plugin, popular open-source models, including Transformer-l
 :maxdepth: 1
 quick_start
 installation
-tutorials
+tutorials/index.md
 faqs
 :::
 

diff --git a/docs/source/installation.md b/docs/source/installation.md
@@ -44,10 +44,12 @@ Refer to [Ascend Environment Setup Guide](https://ascend.github.io/docs/sources/
 
 The easiest way to prepare your software environment is using CANN image directly:
 
-```bash
+```{code-block} bash
+   :substitutions:
 # Update DEVICE according to your device (/dev/davinci[0-7])
 export DEVICE=/dev/davinci7
-
+# Update the vllm-ascend image
+export IMAGE=quay.io/ascend/cann:|cann_image_tag|
 docker run --rm \
     --name vllm-ascend-env \
     --device $DEVICE \
@@ -59,14 +61,16 @@ docker run --rm \
     -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
     -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
     -v /etc/ascend_install.info:/etc/ascend_install.info \
-    -it quay.io/ascend/cann:8.0.0-910b-ubuntu22.04-py3.10 bash
+    -it $IMAGE bash
 ```
 
+:::{dropdown} Click here to see "Install CANN manally"
+:animate: fade-in-slide-down
 You can also install CANN manually:
 
-:::{note}
+```{note}
 This guide takes aarch64 as an example. If you run on x86, you need to replace `aarch64` with `x86_64` for the package name shown below.
-:::
+```
 
 ```bash
 # Create a virtual environment
@@ -94,6 +98,8 @@ chmod +x. /Ascend-cann-nnal_8.0.0_linux-aarch64.run
 source /usr/local/Ascend/nnal/atb/set_env.sh
 ```
 
+:::
+
 ::::
 
 ::::{tab-item} Before using docker
@@ -125,6 +131,7 @@ pip install vllm==|pip_vllm_version|
 pip install vllm-ascend==|pip_vllm_ascend_version| --extra-index https://download.pytorch.org/whl/cpu/
 ```
 
+:::{dropdown} Click here to see "Build from source code"
 or build from **source code**:
 
 ```{code-block} bash
@@ -140,6 +147,7 @@ git clone  --depth 1 --branch |vllm_ascend_version| https://github.com/vllm-proj
 cd vllm-ascend
 pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
 ```
+:::
 
 Current version depends on a unreleased `torch-npu`, you need to install manually:
 
@@ -167,14 +175,23 @@ pip install ./torch_npu-2.5.1.dev20250307-cp310-cp310-manylinux_2_17_aarch64.man
 
 You can just pull the **prebuilt image** and run it with bash.
 
+:::{dropdown} Click here to see "Build from Dockerfile"
+or build IMAGE from **source code**:
+
+```bash
+git clone https://github.com/vllm-project/vllm-ascend.git
+cd vllm-ascend
+docker build -t vllm-ascend-dev-image:latest -f ./Dockerfile .
+```
+:::
+
 ```{code-block} bash
    :substitutions:
 
 # Update DEVICE according to your device (/dev/davinci[0-7])
-DEVICE=/dev/davinci7
+export DEVICE=/dev/davinci7
 # Update the vllm-ascend image
-IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
-docker pull $IMAGE
+export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
 docker run --rm \
     --name vllm-ascend-env \
     --device $DEVICE \
@@ -189,14 +206,6 @@ docker run --rm \
     -it $IMAGE bash
 ```
 
-or build IMAGE from **source code**:
-
-```bash
-git clone https://github.com/vllm-project/vllm-ascend.git
-cd vllm-ascend
-docker build -t vllm-ascend-dev-image:latest -f ./Dockerfile .
-```
-
 ::::
 
 :::::

diff --git a/docs/source/quick_start.md b/docs/source/quick_start.md
@@ -11,12 +11,13 @@
 ```{code-block} bash
    :substitutions:
 
-# You can change version a suitable one base on your requirement, e.g. main
+# Update DEVICE according to your device (/dev/davinci[0-7])
+export DEVICE=/dev/davinci0
+# Update the vllm-ascend image
 export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
-
-docker run \
+docker run --rm \
 --name vllm-ascend \
---device /dev/davinci0 \
+--device $DEVICE \
 --device /dev/davinci_manager \
 --device /dev/devmm_svm \
 --device /dev/hisi_hdc \
@@ -32,17 +33,19 @@ docker run \
 
 ## Usage
 
-There are two ways to start vLLM on Ascend NPU:
-
-### Offline Batched Inference with vLLM
-
-With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing).
+You can use Modelscope mirror to speed up download:
 
 ```bash
-# Use Modelscope mirror to speed up download
 export VLLM_USE_MODELSCOPE=true
 ```
 
+There are two ways to start vLLM on Ascend NPU:
+
+:::::{tab-set}
+::::{tab-item} Offline Batched Inference
+
+With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing).
+
 Try to run below Python script directly or use `python3` shell to generate texts:
 
 ```python
@@ -64,15 +67,15 @@ for output in outputs:
     print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```
 
-### OpenAI Completions API with vLLM
+::::
+
+::::{tab-item} OpenAI Completions API
 
 vLLM can also be deployed as a server that implements the OpenAI API protocol. Run
 the following command to start the vLLM server with the
 [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model:
 
 ```bash
-# Use Modelscope mirror to speed up download
-export VLLM_USE_MODELSCOPE=true
 # Deploy vLLM server (The first run will take about 3-5 mins (10 MB/s) to download models)
 vllm serve Qwen/Qwen2.5-0.5B-Instruct &
 ```
@@ -124,3 +127,5 @@ INFO:     Application shutdown complete.
 ```
 
 Finally, you can exit container by using `ctrl-D`.
+::::
+:::::