-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Docker] Some docker images fail at setup on clouds #3706
Comments
cc @cblmemo - any ideas here? |
I tried manually running the docker container on the GCP VM and got the following output. It seems like this docker container has an entrypoint, thus making the argument $ docker run --name sky4 -it -e LC_ALL=C.UTF-8 -e LANG=C.UTF-8 --ulimit nofile=1048576:1048576 --shm-size="10759170048.0b"
--net=host --cap-add=SYS_ADMIN --device=/dev/fuse --security-opt=apparmor:unconfined vllm/vllm-openai:latest bash
usage: api_server.py [-h] [--host HOST] [--port PORT] [--uvicorn-log-level {debug,info,warning,error,critical,trace}] [--allow-credentials]
[--allowed-origins ALLOWED_ORIGINS] [--allowed-methods ALLOWED_METHODS] [--allowed-headers ALLOWED_HEADERS] [--api-key API_KEY]
[--lora-modules LORA_MODULES [LORA_MODULES ...]] [--chat-template CHAT_TEMPLATE] [--response-role RESPONSE_ROLE] [--ssl-keyfile SSL_KEYFILE]
[--ssl-certfile SSL_CERTFILE] [--ssl-ca-certs SSL_CA_CERTS] [--ssl-cert-reqs SSL_CERT_REQS] [--root-path ROOT_PATH] [--middleware MIDDLEWARE]
[--model MODEL] [--tokenizer TOKENIZER] [--skip-tokenizer-init] [--revision REVISION] [--code-revision CODE_REVISION]
[--tokenizer-revision TOKENIZER_REVISION] [--tokenizer-mode {auto,slow}] [--trust-remote-code] [--download-dir DOWNLOAD_DIR]
[--load-format {auto,pt,safetensors,npcache,dummy,tensorizer,bitsandbytes}] [--dtype {auto,half,float16,bfloat16,float,float32}]
[--kv-cache-dtype {auto,fp8,fp8_e5m2,fp8_e4m3}] [--quantization-param-path QUANTIZATION_PARAM_PATH] [--max-model-len MAX_MODEL_LEN]
[--guided-decoding-backend {outlines,lm-format-enforcer}] [--distributed-executor-backend {ray,mp}] [--worker-use-ray]
[--pipeline-parallel-size PIPELINE_PARALLEL_SIZE] [--tensor-parallel-size TENSOR_PARALLEL_SIZE]
[--max-parallel-loading-workers MAX_PARALLEL_LOADING_WORKERS] [--ray-workers-use-nsight] [--block-size {8,16,32}] [--enable-prefix-caching]
[--disable-sliding-window] [--use-v2-block-manager] [--num-lookahead-slots NUM_LOOKAHEAD_SLOTS] [--seed SEED] [--swap-space SWAP_SPACE]
[--gpu-memory-utilization GPU_MEMORY_UTILIZATION] [--num-gpu-blocks-override NUM_GPU_BLOCKS_OVERRIDE]
[--max-num-batched-tokens MAX_NUM_BATCHED_TOKENS] [--max-num-seqs MAX_NUM_SEQS] [--max-logprobs MAX_LOGPROBS] [--disable-log-stats]
[--quantization {aqlm,awq,deepspeedfp,fp8,marlin,gptq_marlin_24,gptq_marlin,gptq,squeezellm,compressed-tensors,bitsandbytes,None}]
[--rope-scaling ROPE_SCALING] [--rope-theta ROPE_THETA] [--enforce-eager] [--max-context-len-to-capture MAX_CONTEXT_LEN_TO_CAPTURE]
[--max-seq-len-to-capture MAX_SEQ_LEN_TO_CAPTURE] [--disable-custom-all-reduce] [--tokenizer-pool-size TOKENIZER_POOL_SIZE]
[--tokenizer-pool-type TOKENIZER_POOL_TYPE] [--tokenizer-pool-extra-config TOKENIZER_POOL_EXTRA_CONFIG] [--enable-lora] [--max-loras MAX_LORAS]
[--max-lora-rank MAX_LORA_RANK] [--lora-extra-vocab-size LORA_EXTRA_VOCAB_SIZE] [--lora-dtype {auto,float16,bfloat16,float32}]
[--long-lora-scaling-factors LONG_LORA_SCALING_FACTORS] [--max-cpu-loras MAX_CPU_LORAS] [--fully-sharded-loras]
[--device {auto,cuda,neuron,cpu,openvino,tpu,xpu}] [--scheduler-delay-factor SCHEDULER_DELAY_FACTOR] [--enable-chunked-prefill]
[--speculative-model SPECULATIVE_MODEL] [--num-speculative-tokens NUM_SPECULATIVE_TOKENS]
[--speculative-draft-tensor-parallel-size SPECULATIVE_DRAFT_TENSOR_PARALLEL_SIZE] [--speculative-max-model-len SPECULATIVE_MAX_MODEL_LEN]
[--speculative-disable-by-batch-size SPECULATIVE_DISABLE_BY_BATCH_SIZE] [--ngram-prompt-lookup-max NGRAM_PROMPT_LOOKUP_MAX]
[--ngram-prompt-lookup-min NGRAM_PROMPT_LOOKUP_MIN] [--spec-decoding-acceptance-method {rejection_sampler,typical_acceptance_sampler}]
[--typical-acceptance-sampler-posterior-threshold TYPICAL_ACCEPTANCE_SAMPLER_POSTERIOR_THRESHOLD]
[--typical-acceptance-sampler-posterior-alpha TYPICAL_ACCEPTANCE_SAMPLER_POSTERIOR_ALPHA] [--model-loader-extra-config MODEL_LOADER_EXTRA_CONFIG]
[--preemption-mode PREEMPTION_MODE] [--served-model-name SERVED_MODEL_NAME [SERVED_MODEL_NAME ...]]
[--qlora-adapter-name-or-path QLORA_ADAPTER_NAME_OR_PATH] [--otlp-traces-endpoint OTLP_TRACES_ENDPOINT] [--engine-use-ray] [--disable-log-requests]
[--max-log-len MAX_LOG_LEN]
api_server.py: error: unrecognized arguments: bash |
We should check if passing |
This should be fixed by #3867 |
Repro:
Fails on setup with:
Note: Same image works on Kubernetes.
Logs aren't useful:
The text was updated successfully, but these errors were encountered: