-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Speculative Decoding - Draft Target model approach - Having issue with Triton inference Server #2709
Comments
Hi @sivabreddy thanks for reporting this issue. Would u please use the latest main branch commit or 0.16 release to verify the whole process? |
Hi @nv-guomingz , Thank you for your response. Still the issue is same with triton inference server. I'm seeing response from the engine, when i run the below command. mpirun -n 1 --allow-run-as-root python3 /data/TensorRT-LLM/examples/run.py\
--tokenizer_dir /data/llama3-1-8b\
--draft_engine_dir /data/llama3-1-8b/draft_engine\
--engine_dir /data/llama3-3-70b/target_engine\
--draft_target_model_config="[10,[0],[1],False]"\
--kv_cache_free_gpu_memory_fraction=0.95\
--max_output_len=1024\
--kv_cache_enable_block_reuse\
--input_text="what is Newtons third law"
Will you please help/verify with the model repo config files. Thanks. |
Hi @pcastonguay could u please take a look this issue? |
Hi @pcastonguay , I wanted to follow up on the issue shared here earlier. Have you had a chance to look into the issue? any solutions or workarounds? Your insights would be greatly appreciated. |
Any Help.....
Tried deploying Llama3.1-8b as draft model, Llama3.3-70b as target model.
followed all the steps mention here.
https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/advanced/speculative-decoding.md
The Server is not getting up and running.
container Image: nvcr.io/nvidia/tritonserver:24.11-trtllm-python-py3
Version of tensorrt-llm 0.15.0
Version of tensorrt-llm backend 0.15.0
Here i'm sharing details of commands used for building engines and the log.
quantize draft model
python3 /data/TensorRT-LLM/examples/quantization/quantize.py
--model_dir /data/llama3-1-8b
--dtype bfloat16
--qformat fp8
--kv_cache_dtype fp8
--output_dir /data/llama3-1-8b/ckpt_draft
--calib_size 512
--tp_size 1
quantize target model
python3 /data/TensorRT-LLM/examples/quantization/quantize.py
--model_dir /data/llama3-3-70b
--dtype bfloat16
--qformat fp8
--kv_cache_dtype fp8
--output_dir /data/llama3-3-70b/ckpt_target
--calib_size 512
--tp_size 1
build engines
draft engine
trtllm-build
--checkpoint_dir=/data/llama3-1-8b/ckpt_draft
--output_dir=/data/llama3-1-8b/draft_engine
--max_batch_size=1
--max_input_len=2048
--max_seq_len=3072
--gpt_attention_plugin=bfloat16
--gemm_plugin=fp8
--remove_input_padding=enable
--kv_cache_type=paged
--context_fmha=enable
--use_paged_context_fmha=enable
--gather_generation_logits
--use_fp8_context_fmha=enable`
target engine
trtllm-build
--checkpoint_dir=/data/llama3-3-70b/ckpt_target
--output_dir=/data/llama3-3-70b/target_engine
--max_batch_size=1
--max_input_len=2048
--max_seq_len=3072
--gpt_attention_plugin=bfloat16
--gemm_plugin=fp8
--remove_input_padding=enable
--kv_cache_type=paged
--context_fmha=enable
--use_paged_context_fmha=enable
--gather_generation_logits
--use_fp8_context_fmha=enable
--max_draft_len=10
--speculative_decoding_mode=draft_tokens_external
ACCUMULATE_TOKEN="false"
BACKEND="tensorrtllm"
BATCH_SCHEDULER_POLICY="guaranteed_no_evict"
BATCHING_STRATEGY="inflight_fused_batching"
BLS_INSTANCE_COUNT="1"
DECODING_MODE="top_k_top_p"
DECOUPLED_MODE="False"
DRAFT_GPU_DEVICE_IDS="0"
E2E_MODEL_NAME="ensemble"
ENABLE_KV_CACHE_REUSE="true"
ENGINE_PATH=/data/llama3-3-70b/target_engine
EXCLUDE_INPUT_IN_OUTPUT="false"
KV_CACHE_FREE_GPU_MEM_FRACTION="0.95"
MAX_BEAM_WIDTH="1"
MAX_QUEUE_DELAY_MICROSECONDS="0"
NORMALIZE_LOG_PROBS="true"
POSTPROCESSING_INSTANCE_COUNT="1"
PREPROCESSING_INSTANCE_COUNT="1"
TARGET_GPU_DEVICE_IDS="1"
TENSORRT_LLM_DRAFT_MODEL_NAME="tensorrt_llm_draft"
TENSORRT_LLM_MODEL_NAME="tensorrt_llm"
TOKENIZER_PATH=/data/llama3-1-8b
TOKENIZER_TYPE=llama
TRITON_GRPC_PORT="8001"
TRITON_HTTP_PORT="8000"
TRITON_MAX_BATCH_SIZE="4"
TRITON_METRICS_PORT="8002"
TRITON_REPO="triton_repo"
USE_DRAFT_LOGITS="false"
DRAFT_ENGINE_PATH=/data/llama3-1-8b/draft_engine
ENABLE_CHUNKED_CONTEXT="true"
MAX_TOKENS_IN_KV_CACHE=""
MAX_ATTENTION_WINDOW_SIZE=""
Make a copy of triton repo and replace the fields in the configuration files
Prepare model repository for a TensorRT-LLM model
git clone https://github.com/triton-inference-server/tensorrtllm_backend.git
cd tensorrtllm_backend
git checkout v0.15.0
apt-get update && apt-get install -y build-essential cmake git-lfs
pip3 install git-lfs tritonclient grpcio
rm -rf ${TRITON_REPO}
cp -R all_models/inflight_batcher_llm ${TRITON_REPO}
python3 tools/fill_template.py -i ${TRITON_REPO}/ensemble/config.pbtxt triton_max_batch_size:${TRITON_MAX_BATCH_SIZE}
python3 tools/fill_template.py -i ${TRITON_REPO}/preprocessing/config.pbtxt tokenizer_dir:${TOKENIZER_PATH},triton_max_batch_size:${TRITON_MAX_BATCH_SIZE},preprocessing_instance_count:${PREPROCESSING_INSTANCE_COUNT}
python3 tools/fill_template.py -i ${TRITON_REPO}/postprocessing/config.pbtxt tokenizer_dir:${TOKENIZER_PATH},triton_max_batch_size:${TRITON_MAX_BATCH_SIZE},postprocessing_instance_count:${POSTPROCESSING_INSTANCE_COUNT}
python3 tools/fill_template.py -i ${TRITON_REPO}/tensorrt_llm_bls/config.pbtxt triton_max_batch_size:${TRITON_MAX_BATCH_SIZE},decoupled_mode:${DECOUPLED_MODE},accumulate_tokens:${ACCUMULATE_TOKEN},bls_instance_count:${BLS_INSTANCE_COUNT},tensorrt_llm_model_name:${TENSORRT_LLM_MODEL_NAME},tensorrt_llm_draft_model_name:${TENSORRT_LLM_DRAFT_MODEL_NAME}
Make a copy of tensorrt_llm as configurations of draft / target models.
cp -R ${TRITON_REPO}/tensorrt_llm ${TRITON_REPO}/tensorrt_llm_draft
sed -i 's/name: "tensorrt_llm"/name: "tensorrt_llm_draft"/g' ${TRITON_REPO}/tensorrt_llm_draft/config.pbtxt
python3 tools/fill_template.py -i ${TRITON_REPO}/tensorrt_llm/config.pbtxt triton_backend:${BACKEND},engine_dir:${ENGINE_PATH},decoupled_mode:${DECOUPLED_MODE},max_tokens_in_paged_kv_cache:${MAX_TOKENS_IN_KV_CACHE},max_attention_window_size:${MAX_ATTENTION_WINDOW_SIZE},batch_scheduler_policy:${BATCH_SCHEDULER_POLICY},batching_strategy:${BATCHING_STRATEGY},kv_cache_free_gpu_mem_fraction:${KV_CACHE_FREE_GPU_MEM_FRACTION},exclude_input_in_output:${EXCLUDE_INPUT_IN_OUTPUT},triton_max_batch_size:${TRITON_MAX_BATCH_SIZE},max_queue_delay_microseconds:${MAX_QUEUE_DELAY_MICROSECONDS},max_beam_width:${MAX_BEAM_WIDTH},enable_kv_cache_reuse:${ENABLE_KV_CACHE_REUSE},normalize_log_probs:${NORMALIZE_LOG_PROBS},enable_chunked_context:${ENABLE_CHUNKED_CONTEXT},gpu_device_ids:${TARGET_GPU_DEVICE_IDS},decoding_mode:${DECODING_MODE},encoder_input_features_data_type:TYPE_FP16
python3 tools/fill_template.py -i ${TRITON_REPO}/tensorrt_llm_draft/config.pbtxt triton_backend:${BACKEND},engine_dir:${DRAFT_ENGINE_PATH},decoupled_mode:${DECOUPLED_MODE},max_tokens_in_paged_kv_cache:${MAX_TOKENS_IN_KV_CACHE},max_attention_window_size:${MAX_ATTENTION_WINDOW_SIZE},batch_scheduler_policy:${BATCH_SCHEDULER_POLICY},batching_strategy:${BATCHING_STRATEGY},kv_cache_free_gpu_mem_fraction:${KV_CACHE_FREE_GPU_MEM_FRACTION},exclude_input_in_output:${EXCLUDE_INPUT_IN_OUTPUT},triton_max_batch_size:${TRITON_MAX_BATCH_SIZE},max_queue_delay_microseconds:${MAX_QUEUE_DELAY_MICROSECONDS},max_beam_width:${MAX_BEAM_WIDTH},enable_kv_cache_reuse:${ENABLE_KV_CACHE_REUSE},normalize_log_probs:${NORMALIZE_LOG_PROBS},enable_chunked_context:${ENABLE_CHUNKED_CONTEXT},gpu_device_ids:${DRAFT_GPU_DEVICE_IDS},decoding_mode:${DECODING_MODE},encoder_input_features_data_type:TYPE_FP16
root@triton-spec-decode-64884b776d-k9dnc:/data# python3 /data/tensorrtllm_backend/scripts/launch_triton_server.py
--model_repo=/data/tensorrtllm_backend/triton_repo
--tensorrt_llm_model_name "tensorrt_llm_draft,tensorrt_llm"
--multi-model
--log
--log-file /data/tensorrtllm_backend/triton_server.log
&
[1] 43399
root@triton-spec-decode-64884b776d-k9dnc:/data# [TensorRT-LLM][INFO] Using GPU device ids: 1
[TensorRT-LLM][WARNING] iter_stats_max_iterations is not specified, will use default value of 1000
[TensorRT-LLM][WARNING] request_stats_max_iterations is not specified, will use default value of 0
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] cross_kv_cache_fraction is not specified, error if it's encoder-decoder model, otherwise ok
[TensorRT-LLM][WARNING] kv_cache_host_memory_bytes not set, defaulting to 0
[TensorRT-LLM][WARNING] kv_cache_onboard_blocks not set, defaulting to true
[TensorRT-LLM][WARNING] max_attention_window_size is not specified, will use default value (i.e. max_sequence_length)
[TensorRT-LLM][WARNING] sink_token_length is not specified, will use default value
[TensorRT-LLM][WARNING] enable_chunked_context is set to true, will use context chunking (requires building the model with use_paged_context_fmha).
[TensorRT-LLM][WARNING] lora_cache_max_adapter_size not set, defaulting to 64
[TensorRT-LLM][WARNING] lora_cache_optimal_adapter_size not set, defaulting to 8
[TensorRT-LLM][WARNING] lora_cache_gpu_memory_fraction not set, defaulting to 0.05
[TensorRT-LLM][WARNING] lora_cache_host_memory_bytes not set, defaulting to 1GB
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][WARNING] multi_block_mode is not specified, will be set to true
[TensorRT-LLM][WARNING] enable_context_fmha_fp32_acc is not specified, will be set to false
[TensorRT-LLM][WARNING] cuda_graph_mode is not specified, will be set to false
[TensorRT-LLM][WARNING] cuda_graph_cache_size is not specified, will be set to 0
[TensorRT-LLM][INFO] speculative_decoding_fast_logits is not specified, will be set to false
[TensorRT-LLM][WARNING] gpu_weights_percent parameter is not specified, will use default value of 1.0
[TensorRT-LLM][INFO] recv_poll_period_ms is not set, will use busy loop
[TensorRT-LLM][WARNING] encoder_model_path is not specified, will be left empty
[TensorRT-LLM][INFO] Engine version 0.15.0 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Engine version 0.15.0 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Using user-specified devices: (1)
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Using user-specified devices: (1)
[TensorRT-LLM][INFO] Rank 0 is using GPU 1
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 1
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 1
[TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1
[TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 3082
[TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 10
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: (3082) * 80
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 3072
[TensorRT-LLM][INFO] TRTGptModel maxInputLen: 3081 = maxSequenceLen - 1 since chunked context is enabled
[TensorRT-LLM][INFO] TRTGptModel If model type is encoder, maxInputLen would be reset in trtEncoderModel to maxInputLen: 3082 = maxSequenceLen.
[TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT
[TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None
[TensorRT-LLM][INFO] Using GPU device ids: 0
[TensorRT-LLM][WARNING] iter_stats_max_iterations is not specified, will use default value of 1000
[TensorRT-LLM][WARNING] request_stats_max_iterations is not specified, will use default value of 0
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] cross_kv_cache_fraction is not specified, error if it's encoder-decoder model, otherwise ok
[TensorRT-LLM][WARNING] kv_cache_host_memory_bytes not set, defaulting to 0
[TensorRT-LLM][WARNING] kv_cache_onboard_blocks not set, defaulting to true
[TensorRT-LLM][WARNING] max_attention_window_size is not specified, will use default value (i.e. max_sequence_length)
[TensorRT-LLM][WARNING] sink_token_length is not specified, will use default value
[TensorRT-LLM][WARNING] enable_chunked_context is set to true, will use context chunking (requires building the model with use_paged_context_fmha).
[TensorRT-LLM][WARNING] lora_cache_max_adapter_size not set, defaulting to 64
[TensorRT-LLM][WARNING] lora_cache_optimal_adapter_size not set, defaulting to 8
[TensorRT-LLM][WARNING] lora_cache_gpu_memory_fraction not set, defaulting to 0.05
[TensorRT-LLM][WARNING] lora_cache_host_memory_bytes not set, defaulting to 1GB
[TensorRT-LLM][WARNING] multi_block_mode is not specified, will be set to true
[TensorRT-LLM][WARNING] enable_context_fmha_fp32_acc is not specified, will be set to false
[TensorRT-LLM][WARNING] cuda_graph_mode is not specified, will be set to false
[TensorRT-LLM][WARNING] cuda_graph_cache_size is not specified, will be set to 0
[TensorRT-LLM][INFO] speculative_decoding_fast_logits is not specified, will be set to false
[TensorRT-LLM][WARNING] gpu_weights_percent parameter is not specified, will use default value of 1.0
[TensorRT-LLM][INFO] recv_poll_period_ms is not set, will use busy loop
[TensorRT-LLM][WARNING] encoder_model_path is not specified, will be left empty
[TensorRT-LLM][INFO] Engine version 0.15.0 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:14: UserWarning: Failed to load image Python extension: 'libpng16.so.16: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source?warn(
[TensorRT-LLM][WARNING] Don't setup 'skip_special_tokens' correctly (set value is ${skip_special_tokens}). Set it as True by default.
[TensorRT-LLM][INFO] Engine version 0.15.0 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Using user-specified devices: (0)
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Using user-specified devices: (0)
[TensorRT-LLM][INFO] Rank 0 is using GPU 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 1
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 1
[TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1
[TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 3072
[TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: (3072) * 32
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 3072
[TensorRT-LLM][INFO] TRTGptModel maxInputLen: 3071 = maxSequenceLen - 1 since chunked context is enabled
[TensorRT-LLM][INFO] TRTGptModel If model type is encoder, maxInputLen would be reset in trtEncoderModel to maxInputLen: 3072 = maxSequenceLen.
[TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT
[TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None
[TensorRT-LLM][WARNING] 'max_num_images' parameter is not set correctly (value is ${max_num_images}). Will be set to None
[TensorRT-LLM][WARNING] Don't setup 'add_special_tokens' correctly (set value is ${add_special_tokens}). Set it as True by default.
[TensorRT-LLM][INFO] Loaded engine size: 8730 MiB
[TensorRT-LLM][INFO] Inspecting the engine to identify potential runtime issues...
[TensorRT-LLM][INFO] The profiling verbosity of the engine does not allow this analysis to proceed. Re-build the engine with 'detailed' profiling verbosity to get more diagnostics.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 294.01 MiB for execution context memory.
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 8724 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 4.00 MB GPU memory for runtime buffers.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 1.32 MB GPU memory for decoder.
[TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 79.10 GiB, available: 69.15 GiB
[TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 16819
[TensorRT-LLM][INFO] Number of blocks in KV cache secondary pool: 0, onboard blocks to primary memory before reuse: true
[TensorRT-LLM][INFO] Max KV cache pages per sequence: 48
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 65.70 GiB for max tokens in paged KV cache (1076416).
[TensorRT-LLM][INFO] Enable MPI KV cache transport.
[TensorRT-LLM][INFO] Executor instance created by worker
[TensorRT-LLM][WARNING] cancellation_check_period_ms is not specified, will be set to 100 (ms)
[TensorRT-LLM][WARNING] stats_check_period_ms is not specified, will be set to 100 (ms)
[TensorRT-LLM][INFO] Loaded engine size: 69369 MiB
[TensorRT-LLM][INFO] Inspecting the engine to identify potential runtime issues...
[TensorRT-LLM][INFO] The profiling verbosity of the engine does not allow this analysis to proceed. Re-build the engine with 'detailed' profiling verbosity to get more diagnostics.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 540.01 MiB for execution context memory.
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 69354 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 6.69 MB GPU memory for runtime buffers.
[TensorRT-LLM][WARNING] Overwriting decoding mode to external draft token
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 12.40 MB GPU memory for decoder.
[TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 79.10 GiB, available: 10.21 GiB
[TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 994
[TensorRT-LLM][INFO] Number of blocks in KV cache secondary pool: 0, onboard blocks to primary memory before reuse: true
[TensorRT-LLM][INFO] Max KV cache pages per sequence: 49
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 9.71 GiB for max tokens in paged KV cache (63616).
[TensorRT-LLM][INFO] Enable MPI KV cache transport.
[TensorRT-LLM][INFO] Executor instance created by worker
[TensorRT-LLM][WARNING] cancellation_check_period_ms is not specified, will be set to 100 (ms)
[TensorRT-LLM][WARNING] stats_check_period_ms is not specified, will be set to 100 (ms)`
after this there is nothing on console. But i see the GPUs are loaded with weights.
GPU details:
Every 1.0s: nvidia-smi triton-spec-decode-64884b776d-k9dnc: Sun Jan 19 14:41:47 2025
Sun Jan 19 14:41:47 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:1B:00.0 Off | 0 |
| N/A 35C P0 148W / 700W | 77993MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 On | 00000000:29:00.0 Off | 0 |
| N/A 32C P0 143W / 700W | 80508MiB / 81559MiB | 0% Default |
| | | Disabled |
The text was updated successfully, but these errors were encountered: