Skip to content

Commit 7710e7f

Browse files
committed
Create a new "Usage" section in the docs
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
1 parent ef31eab commit 7710e7f

22 files changed

+34
-27
lines changed

docs/source/index.rst

+15-10
Original file line numberDiff line numberDiff line change
@@ -85,12 +85,23 @@ Documentation
8585
serving/deploying_with_nginx
8686
serving/distributed_serving
8787
serving/metrics
88-
serving/env_vars
89-
serving/usage_stats
9088
serving/integrations
9189
serving/tensorizer
92-
serving/compatibility_matrix
93-
serving/faq
90+
91+
.. toctree::
92+
:maxdepth: 1
93+
:caption: Usage
94+
95+
usage/lora
96+
usage/structured_outputs
97+
usage/spec_decode
98+
usage/vlm
99+
usage/compatibility_matrix
100+
usage/performance
101+
usage/faq
102+
usage/engine_args
103+
usage/env_vars
104+
usage/usage_stats
94105

95106
.. toctree::
96107
:maxdepth: 1
@@ -99,12 +110,6 @@ Documentation
99110
models/supported_models
100111
models/adding_model
101112
models/enabling_multimodal_inputs
102-
models/engine_args
103-
models/lora
104-
models/vlm
105-
models/structured_outputs
106-
models/spec_decode
107-
models/performance
108113

109114
.. toctree::
110115
:maxdepth: 1

docs/source/serving/openai_compatible_server.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ We currently support the following OpenAI APIs:
3232
- [Completions API](https://platform.openai.com/docs/api-reference/completions)
3333
- *Note: `suffix` parameter is not supported.*
3434
- [Chat Completions API](https://platform.openai.com/docs/api-reference/chat)
35-
- [Vision](https://platform.openai.com/docs/guides/vision)-related parameters are supported; see [Using VLMs](../models/vlm.rst).
35+
- [Vision](https://platform.openai.com/docs/guides/vision)-related parameters are supported; see [Using VLMs](../usage/vlm.rst).
3636
- *Note: `image_url.detail` parameter is not supported.*
3737
- We also support `audio_url` content type for audio files.
3838
- Refer to [vllm.entrypoints.chat_utils](https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/chat_utils.py) for the exact schema.
@@ -41,7 +41,7 @@ We currently support the following OpenAI APIs:
4141
- [Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)
4242
- Instead of `inputs`, you can pass in a list of `messages` (same schema as Chat Completions API),
4343
which will be treated as a single prompt to the model according to its chat template.
44-
- This enables multi-modal inputs to be passed to embedding models, see [Using VLMs](../models/vlm.rst).
44+
- This enables multi-modal inputs to be passed to embedding models, see [Using VLMs](../usage/vlm.rst).
4545
- *Note: You should run `vllm serve` with `--task embedding` to ensure that the model is being run in embedding mode.*
4646

4747
## Score API for Cross Encoder Models
File renamed without changes.
File renamed without changes.

docs/source/serving/faq.rst docs/source/usage/faq.rst

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _faq:
2+
13
Frequently Asked Questions
24
===========================
35

File renamed without changes.
File renamed without changes.

docs/source/models/spec_decode.rst docs/source/usage/spec_decode.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ speculative decoding, breaking down the guarantees into three key areas:
182182
3. **vLLM Logprob Stability**
183183
- vLLM does not currently guarantee stable token log probabilities (logprobs). This can result in different outputs for the
184184
same request across runs. For more details, see the FAQ section
185-
titled *Can the output of a prompt vary across runs in vLLM?* in the `FAQs <../serving/faq>`_.
185+
titled *Can the output of a prompt vary across runs in vLLM?* in the :ref:`FAQs <faq>`.
186186

187187

188188
**Conclusion**
@@ -197,7 +197,7 @@ can occur due to following factors:
197197

198198
**Mitigation Strategies**
199199

200-
For mitigation strategies, please refer to the FAQ entry *Can the output of a prompt vary across runs in vLLM?* in the `FAQs <../serving/faq>`_.
200+
For mitigation strategies, please refer to the FAQ entry *Can the output of a prompt vary across runs in vLLM?* in the :ref:`FAQs <faq>`.
201201

202202
Resources for vLLM contributors
203203
-------------------------------
File renamed without changes.
File renamed without changes.

vllm/attention/backends/rocm_flash_attn.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -429,7 +429,7 @@ def forward(
429429
Returns:
430430
shape = [num_tokens, num_heads * head_size]
431431
"""
432-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
432+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
433433
# If the feature combo become valid
434434
if attn_type != AttentionType.DECODER:
435435
raise NotImplementedError("Encoder self-attention and "

vllm/config.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -509,7 +509,7 @@ def verify_async_output_proc(self, parallel_config, speculative_config,
509509
self.use_async_output_proc = False
510510
return
511511

512-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
512+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
513513
# If the feature combo become valid
514514
if device_config.device_type not in ("cuda", "tpu", "xpu", "hpu"):
515515
logger.warning(
@@ -525,7 +525,7 @@ def verify_async_output_proc(self, parallel_config, speculative_config,
525525
self.use_async_output_proc = False
526526
return
527527

528-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
528+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
529529
# If the feature combo become valid
530530
if device_config.device_type == "cuda" and self.enforce_eager:
531531
logger.warning(
@@ -540,7 +540,7 @@ def verify_async_output_proc(self, parallel_config, speculative_config,
540540
if self.task == "embedding":
541541
self.use_async_output_proc = False
542542

543-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
543+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
544544
# If the feature combo become valid
545545
if speculative_config:
546546
logger.warning("Async output processing is not supported with"
@@ -1721,7 +1721,7 @@ def verify_with_model_config(self, model_config: ModelConfig):
17211721
model_config.quantization)
17221722

17231723
def verify_with_scheduler_config(self, scheduler_config: SchedulerConfig):
1724-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
1724+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
17251725
# If the feature combo become valid
17261726
if scheduler_config.chunked_prefill_enabled:
17271727
raise ValueError("LoRA is not supported with chunked prefill yet.")

vllm/engine/arg_utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1110,7 +1110,7 @@ def create_engine_config(self,
11101110
disable_logprobs=self.disable_logprobs_during_spec_decoding,
11111111
)
11121112

1113-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
1113+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
11141114
# If the feature combo become valid
11151115
if self.num_scheduler_steps > 1:
11161116
if speculative_config is not None:

vllm/engine/output_processor/multi_step.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ def process_prompt_logprob(self, seq_group: SequenceGroup,
6565
@staticmethod
6666
@functools.lru_cache
6767
def _log_prompt_logprob_unsupported_warning_once():
68-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
68+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
6969
# If the feature combo become valid
7070
logger.warning(
7171
"Prompt logprob is not supported by multi step workers. "

vllm/executor/cpu_executor.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ class CPUExecutor(ExecutorBase):
2323

2424
def _init_executor(self) -> None:
2525
assert self.device_config.device_type == "cpu"
26-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
26+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
2727
# If the feature combo become valid
2828
assert self.lora_config is None, "cpu backend doesn't support LoRA"
2929

vllm/platforms/cpu.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ def check_and_update_config(cls, vllm_config: VllmConfig) -> None:
4646
import vllm.envs as envs
4747
from vllm.utils import GiB_bytes
4848
model_config = vllm_config.model_config
49-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
49+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
5050
# If the feature combo become valid
5151
if not model_config.enforce_eager:
5252
logger.warning(

vllm/spec_decode/spec_decode_worker.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ def create_spec_worker(*args, **kwargs) -> "SpecDecodeWorker":
104104
return spec_decode_worker
105105

106106

107-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
107+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
108108
# If the feature combo become valid
109109
class SpecDecodeWorker(LoraNotSupportedWorkerBase):
110110
"""Worker which implements speculative decoding.

vllm/utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@
4747

4848
# Exception strings for non-implemented encoder/decoder scenarios
4949

50-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
50+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
5151
# If the feature combo become valid
5252

5353
STR_NOT_IMPL_ENC_DEC_SWA = \

vllm/worker/multi_step_model_runner.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -817,7 +817,7 @@ def _pythonize_sampler_output(
817817

818818
for sgdx, (seq_group,
819819
sample_result) in enumerate(zip(seq_groups, samples_list)):
820-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
820+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
821821
# If the feature combo become valid
822822
# (Check for Guided Decoding)
823823
if seq_group.sampling_params.logits_processors:

vllm/worker/utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ def assert_enc_dec_mr_supported_scenario(
1313
a supported scenario.
1414
'''
1515

16-
# Reminder: Please update docs/source/serving/compatibility_matrix.rst
16+
# Reminder: Please update docs/source/usage/compatibility_matrix.rst
1717
# If the feature combo become valid
1818

1919
if enc_dec_mr.cache_config.enable_prefix_caching:

0 commit comments

Comments
 (0)