You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source/getting_started/installation.rst
+9-5
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,10 @@ You can install vLLM using pip:
26
26
$ # Install vLLM with CUDA 12.1.
27
27
$ pip install vllm
28
28
29
+
.. note::
30
+
31
+
Although we recommend using ``conda`` to create and manage Python environments, it is highly recommended to use ``pip`` to install vLLM. This is because ``pip`` can install ``torch`` with separate library packages like ``NCCL``, while ``conda`` installs ``torch`` with statically linked ``NCCL``. This can cause issues when vLLM tries to use ``NCCL``. See `this issue <https://github.com/vllm-project/vllm/issues/8420>`_ for more details.
32
+
29
33
.. note::
30
34
31
35
As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions by default.
@@ -80,11 +84,11 @@ You can also build and install vLLM from source:
80
84
81
85
.. tip::
82
86
83
-
Building from source requires quite a lot compilation. If you are building from source for multiple times, it is beneficial to cache the compilation results. For example, you can install `ccache <https://github.com/ccache/ccache>`_ via either `conda install ccache` or `apt install ccache` . As long as `which ccache` command can find the `ccache` binary, it will be used automatically by the build system. After the first build, the subsequent builds will be much faster.
87
+
Building from source requires quite a lot compilation. If you are building from source for multiple times, it is beneficial to cache the compilation results. For example, you can install `ccache <https://github.com/ccache/ccache>`_ via either ``conda install ccache`` or ``apt install ccache`` . As long as ``which ccache`` command can find the ``ccache`` binary, it will be used automatically by the build system. After the first build, the subsequent builds will be much faster.
84
88
85
89
.. tip::
86
90
To avoid your system being overloaded, you can limit the number of compilation jobs
87
-
to be run simultaneously, via the environment variable `MAX_JOBS`. For example:
91
+
to be run simultaneously, via the environment variable ``MAX_JOBS``. For example:
88
92
89
93
.. code-block:: console
90
94
@@ -99,7 +103,7 @@ You can also build and install vLLM from source:
99
103
$ # Use `--ipc=host` to make sure the shared memory is large enough.
100
104
$ docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
101
105
102
-
If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-toolkit-archive>`_. After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g.:
106
+
If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-toolkit-archive>`_. After installation, set the environment variable ``CUDA_HOME`` to the installation path of CUDA Toolkit, and make sure that the ``nvcc`` compiler is in your ``PATH``, e.g.:
0 commit comments