-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Docker] Docker image as runtime fails on GCP #3934
Labels
Comments
Is there a reproducable image id for this bug? |
Can we try out the default pytorch docker image? |
$ sky launch --cloud gcp --gpus T4 --image-id docker:pytorch/pytorch
I 09-16 09:58:37 optimizer.py:719] == Optimizer ==
I 09-16 09:58:37 optimizer.py:730] Target: minimizing cost
I 09-16 09:58:37 optimizer.py:742] Estimated cost: $0.6 / hour
I 09-16 09:58:37 optimizer.py:742]
I 09-16 09:58:37 optimizer.py:867] Considered resources (1 node):
I 09-16 09:58:37 optimizer.py:937] ---------------------------------------------------------------------------------------------
I 09-16 09:58:37 optimizer.py:937] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 09-16 09:58:37 optimizer.py:937] ---------------------------------------------------------------------------------------------
I 09-16 09:58:37 optimizer.py:937] GCP n1-highmem-4 4 26 T4:1 us-central1-a 0.59 ✔
I 09-16 09:58:37 optimizer.py:937] ---------------------------------------------------------------------------------------------
I 09-16 09:58:37 optimizer.py:937]
Launching a new cluster 'sky-73cd-txia'. Proceed? [Y/n]:
I 09-16 09:58:37 cloud_vm_ray_backend.py:4397] Creating a new cluster: 'sky-73cd-txia' [1x GCP(n1-highmem-4, {'T4': 1}, image_id={'us-central1': 'docker:pytorch/pytorch'})].
I 09-16 09:58:37 cloud_vm_ray_backend.py:4397] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
I 09-16 09:58:40 cloud_vm_ray_backend.py:1314] To view detailed progress: tail -n100 -f /home/txia/sky_logs/sky-2024-09-16-09-58-35-756478/provision.log
I 09-16 09:58:43 provisioner.py:65] Launching on GCP us-central1 (us-central1-a)
I 09-16 10:01:45 provisioner.py:450] Successfully provisioned or found existing instance.
I 09-16 10:05:58 provisioner.py:552] Successfully provisioned cluster: sky-73cd-txia
I 09-16 10:05:58 cloud_vm_ray_backend.py:3406] Run commands not specified or empty.
I 09-16 10:05:58 cloud_vm_ray_backend.py:3450]
I 09-16 10:05:58 cloud_vm_ray_backend.py:3450] Cluster name: sky-73cd-txia
I 09-16 10:05:58 cloud_vm_ray_backend.py:3450] To log into the head VM: ssh sky-73cd-txia
I 09-16 10:05:58 cloud_vm_ray_backend.py:3450] To submit a job: sky exec sky-73cd-txia yaml_file
I 09-16 10:05:58 cloud_vm_ray_backend.py:3450] To stop the cluster: sky stop sky-73cd-txia
I 09-16 10:05:58 cloud_vm_ray_backend.py:3450] To teardown the cluster: sky down sky-73cd-txia
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
sky-73cd-txia < 1 sec 1x GCP(n1-highmem-4, {'T4': 1}, image_id={'us-central1': 'docker:pytor... UP - sky launch --cloud gcp --...
sky-344a-txia 4 days ago 1x Azure(Standard_NV18ads_A10_v5, {'A10': 0.5}) STOPPED - sky exec sky-344a-txia sl...
sky-jobs-controller-4a0782e9 1 week ago 1x GCP(n2-standard-8, disk_size=50) STOPPED 10m sky jobs launch -n t-mana... |
This should be fixed by #3867. Closing now |
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
A user reported that even with pytorch default docker image as runtime, it could fail to launch on GCP. We should investigate that.
The text was updated successfully, but these errors were encountered: