Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Refresh local aws images.csv when image not found #4127

Merged
merged 1 commit into from
Oct 18, 2024
Merged

Refresh local aws images.csv when image not found #4127

merged 1 commit into from
Oct 18, 2024

Conversation

yika-luo
Copy link
Collaborator

@yika-luo yika-luo commented Oct 18, 2024

refresh local images.csv by pulling from github catalog again

Tested (run the relevant ones):

  • Code formatting: bash format.sh

@yika-luo yika-luo changed the title Refresh local aws images.csv by pulling from github catalog when imag… Refresh local aws images.csv when image not found Oct 18, 2024
@yika-luo yika-luo requested a review from Michaelvll October 18, 2024 23:12
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @yika-luo!

@yika-luo yika-luo enabled auto-merge October 18, 2024 23:36
@yika-luo yika-luo added this pull request to the merge queue Oct 18, 2024
Merged via the queue into master with commit 7971aa2 Oct 18, 2024
20 checks passed
@yika-luo yika-luo deleted the fix branch October 18, 2024 23:46
cblmemo added a commit that referenced this pull request Oct 21, 2024
…t Flow Definition (#4067)

* provide an example, edited from pipeline.yml

* more focus on dependencies for user dag lib

* more powerful user interface

* load and dump new yaml format

* fix

* fix: reversed logic in add_edge

* [docs] Unroll k8s internal load balancer docs (#4083)

unroll load balancer docs

* rename

* refactor due to reviewer's comments

* generate task.name if not given

* [docs] `sky status --kubernetes` docs (#4064)

* observability docs

* comments

* [UX] Show log after failure and fix the color issue with narrow window (#4084)

* fix narrow window and show log path during exception

* format

* format

* [k8s] `sky status --k8s` refactor (#4079)

* refactor

* lint

* refactor, dataclass

* refactor, dataclass

* refactor

* lint

* add comments for add_edge

* add `print_exception_no_traceback` when raise

* make `Dag.tasks` a property

* print dependencies for `__repr__`

* move `get_unique_task_name` to common_utils

* [Performance] Use new GCP custom images (#4027)

* [Performance] Use new custom image to create GCP GPU VMs

* update image tags for both CPU and GPU

* always generate .sky/python_path

---------

Co-authored-by: Yika Luo <yikaluo@Yikas-MacBook-Pro.local>

* [GCP] Add H100 mega (#4099)

* Add H100 mega support on GCP

* fix for some other regions

* format

* fix resource type

* fix catalog fetching

* [GCP] Add gVNIC support (#4095)

* add gvnic support through config.yaml

* lint

* docs

* [Lambda] Lambda Cloud SkyPilot provisioner (#3865)

* feat: lambda cloud new provisioner

* feat: address cblmemo reviews and other reviews + make multi-node work again

* fix: quotes

* fix: address some reviews

* chore: rm unused option

* chore: update typedef

* feat: use lists directly

* fix: formatting

* chore: address reviews

* fix: formatting

* chore: rm query ports since default impl per review

* feat: add back query ports

* fix: formatting

* chore: add newline at eof

* feat: try removing query ports again

* [Docs] GKE Nvidia Driver installation instructions update (#4106)

* docs

* docs

* docs

* [Performance] Use new AWS custom images (#4091)

* rename methods to use downstream/edge terminology

* [Performance] Add Packer image generation scripts for GCP and AWS (#4068)

* [Performance] Add Packer image generation scripts for GCP and AWS

* Add docker install and tests

* solve nvidia container issue

* Install cuDNN

* [Performance] Scripts to copy/delete AWS images for all regions and add cloud deps (#4073)

* [Performance] Add AWS script to copy images for all regions

* script to delete all AWS images across regions

* Add cloud dependencies to image

---------

Co-authored-by: Yika Luo <yikaluo@Yikas-MacBook-Pro.local>

* Disable AWS images.csv refreshing (#4116)

* [Docs] .skyignore doc (#4114)

* [Docs] .skyignore doc

* Correct typos

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

---------

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* [Core] Raise error for none existing cluster when endpoint is called (#4117)

raise error for none existing cluster

* Refresh local aws images.csv when image not found (#4127)

Refresh local aws images.csv by pulling from github catalog when image tag not found

* [Docs] News revamps. (#4126)

* News revamps.

updates

updates

updates

updates

updates

updates

updates

updates

* Apply suggestions from code review

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

---------

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* [Serve] Support manually terminating a replica and with purge option (#4032)

* define replica id param in cli

* create endpoint on controller

* call controller endpoint to scale down replica

* add classmethod decorator

* add handler methods for readability in cli

* update docstr and error msg, and inline in cli

* update log and return err msg

* add docstr, catch and reraise err, add stopped and nonexistent message

* inline constant to avoid circular import

* fix error statement and return encoded str

* add purge feature

* add purge replica usage in docstr

* use .get to handle unexpected packages

* fix: diff terminate replica when failed/purging or not

* fix: stay up to date for `is_controller_accessible`

* revert

* up to date with current APIs

* error handling

* when purged remove record in the main loop

* refactor due to reviewer's suggestions

* combine functions

* fix: terminate the healthy replica even with purge option

* remove abbr

* Update sky/serve/core.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/serve/core.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/serve/controller.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/serve/controller.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/cli.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* got services hint

* check if not yes in the outside if branch

* fix some output messages

* Update sky/serve/core.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* set conflict status code for already scheduled termination

* combine purge and normal terminating down branch together

* bump version

* global exception handler to render a json response with error messages

* fix: use responses.JSONResponse for dict serialize

* error messages for old controller

* fix: check version mismatch in generated code

* revert mistakenly change update_service

* refine already in terminating message

* fix: branch code workaround in cls.build

* wording

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* refactor due to reviewer's comments

* fix use ux_utils

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* add changelog as comments

* fix messages

* edit the message for mismatch error

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* no traceback when raising in `terminate_replica`

* messages decode

* Apply suggestions from code review

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* format

* forma

* Empty commit

---------

Co-authored-by: David Tran <davidtran@Davids-MacBook-Pro-2.local>
Co-authored-by: David Tran <david.tran@datadoghq.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>

* [Provisioner] Support docker in Lambda Cloud and TPU (#4115)

* [Provisioner] Support docker in Lambda Cloud

* fix permission issue

* merge with check docker installed

* add tpu support & test

* patch lambda cloud

* add comment

* Apply suggestions from code review

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* change wording all to up/downstream style

* Add unique suffix to task names, fallback to timestamp if unnamed

* Unify handling of single and multiple tasks without dependencies

* Refactor tasks initialization: use list comprehension and fail fast

* Fix remove task dependency description: upstream, not downstream

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Remove duplicated `self.edges`, use nx api instead

* [Serve] Add `ux_utils.print_exception_no_traceback()` for cleaner error output (#4111)

* add `ux_utils.print_exception_no_traceback()` for cleaner error output

* Empty commit

* remove unnecessary with block

* Partially revert: Remove unnecessary `ux_utils.print_exception_no_traceback()` wrappers (#4130)

fix unnecessary with block for returning

* Revert "Add unique suffix to task names, fallback to timestamp if unnamed"

Otherwise, users can not refer to the task by name in the DAG.

This reverts commit 8486352.

* comment the checking used as upstream logic

* [examples] Deepspeed fixes + k8s support (#4124)

deepspeed kubernetes fixes

* Empty commit

* [OCI] Support more OS types in addition to ubuntu (#4080)

* Bug fix for sky config file path resolution.

* format

* [OCI] Bug fix for image_id in Task YAML

* [OCI]: Support more OS types (esp. oraclelinux) in addition to ubuntu.

* format

* Disable system firewall

* Bug fix for validation of the Marketplace images

* Update sky/clouds/oci.py

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* Update sky/clouds/oci.py

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* variable/function naming

* address review comments: not to change the service_catalog api. call oci_catalog directly for get os type for a image.

* Update sky/clouds/oci.py

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* Update sky/clouds/oci.py

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* Update sky/clouds/oci.py

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* address review comments

---------

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>

* Apply suggestions from code review

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* fix: typing.cast

* add TODOs for future function migration

* remove dependencies wording to reduce ambiguity

* temporarily add github actions

---------

Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu>
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Co-authored-by: yika-luo <yikaluo@assemblesys.com>
Co-authored-by: Yika Luo <yikaluo@Yikas-MacBook-Pro.local>
Co-authored-by: Kote Mushegiani <mushegiani@gmail.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: David Tran <davidtran@Davids-MacBook-Pro-2.local>
Co-authored-by: David Tran <david.tran@datadoghq.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>
Co-authored-by: Hysun He <hysunhe@foxmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants