Add small example to metrics.rst (vllm-project#10550)

mgoin · web-flow · commit 9afa01455237 · 2024-11-21T23:43:43.000Z
diff --git a/docs/source/serving/metrics.rst b/docs/source/serving/metrics.rst
@@ -2,9 +2,34 @@ Production Metrics
 ==================
 
 vLLM exposes a number of metrics that can be used to monitor the health of the
-system. These metrics are exposed via the `/metrics` endpoint on the vLLM
+system. These metrics are exposed via the ``/metrics`` endpoint on the vLLM
 OpenAI compatible API server.
 
+You can start the server using Python, or using [Docker](deploying_with_docker.rst):
+
+.. code-block:: console
+
+    $ vllm serve unsloth/Llama-3.2-1B-Instruct
+
+Then query the endpoint to get the latest metrics from the server:
+
+.. code-block:: console
+
+    $ curl http://0.0.0.0:8000/metrics
+    
+    # HELP vllm:iteration_tokens_total Histogram of number of tokens per engine_step.
+    # TYPE vllm:iteration_tokens_total histogram
+    vllm:iteration_tokens_total_sum{model_name="unsloth/Llama-3.2-1B-Instruct"} 0.0
+    vllm:iteration_tokens_total_bucket{le="1.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
+    vllm:iteration_tokens_total_bucket{le="8.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
+    vllm:iteration_tokens_total_bucket{le="16.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
+    vllm:iteration_tokens_total_bucket{le="32.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
+    vllm:iteration_tokens_total_bucket{le="64.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
+    vllm:iteration_tokens_total_bucket{le="128.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
+    vllm:iteration_tokens_total_bucket{le="256.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
+    vllm:iteration_tokens_total_bucket{le="512.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
+    ...
+
 The following metrics are exposed:
 
 .. literalinclude:: ../../../vllm/engine/metrics.py