-
Notifications
You must be signed in to change notification settings - Fork 65
LLM update API route #387
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
LLM update API route #387
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we test whether this would result in zero-downtime deployments of models?
Yep, for a test deployment, I was able to change the model weights for an endpoint with no downtime 🙂 Although I'm not sure if there's a way to tell from our API at what point the changes take effect? 🤔 |
raise ObjectNotFoundException | ||
if not self.authz_module.check_access_read_owned_entity( | ||
user, model_endpoint.record | ||
) and not self.authz_module.check_endpoint_public_inference_for_user( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to make sure only the owner can run update_endpoint, e.g. I think this needs to be check_access_write_owned_entity and no check_endpoint_public_inference_for_user?
LLMInferenceFramework.LIGHTLLM, | ||
LLMInferenceFramework.TENSORRT_LLM, | ||
]: | ||
if endpoint_record.endpoint_type != ModelEndpointType.STREAMING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be request.endpoint_type
?
]: | ||
if endpoint_record.endpoint_type != ModelEndpointType.STREAMING: | ||
raise ObjectHasInvalidValueException( | ||
f"Creating endpoint type {str(endpoint_record.endpoint_type)} is not allowed. Can only create streaming endpoints for text-generation-inference, vLLM and LightLLM." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also should we mention tensorrtllm in the error as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we probably want a shared constant string or helper function to use in both error sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually gonna remove this check since we're not allowing changing either endpoint type or inference framework 😅
fake_llm_artifact_gateway, | ||
fake_llm_model_endpoint_service, | ||
create_llm_model_endpoint_request_streaming: CreateLLMModelEndpointV1Request, | ||
update_llm_model_endpoint_request: UpdateLLMModelEndpointV1Request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we test some more cases such as changing resource requests/cases where we expect the use case to return an error?
@@ -234,6 +241,68 @@ async def get_model_endpoint( | |||
) from exc | |||
|
|||
|
|||
@llm_router_v1.post( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a PUT
.
# General endpoint fields | ||
metadata: Optional[Dict[str, Any]] | ||
post_inference_hooks: Optional[List[str]] | ||
endpoint_type: Optional[ModelEndpointType] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In vanilla Launch, we've typically prevented updating to change endpoint types. The general point is that not all of the "create" fields are necessarily relevant or allowable for updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok below, you do throw a runtime error to prevent this. Makes me wonder why in Launch we even allowed endpoint_type
in the request payload for updates 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah oops, UpdateModelEndpointV1Request
does not include endpoint_type
😅 Will remove endpoint_type
and inference_framework
from UpdateLLMModelEndpointV1Request
!
|
||
|
||
class UpdateLLMModelEndpointV1Response(BaseModel): | ||
endpoint_creation_task_id: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call this endpoint_update_task_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-LLM specific UpdateModelEndpointV1Response
has endpoint_creation_task_id
... do we want to keep these consistent? 🤔
]: | ||
if endpoint_record.endpoint_type != ModelEndpointType.STREAMING: | ||
raise ObjectHasInvalidValueException( | ||
f"Creating endpoint type {str(endpoint_record.endpoint_type)} is not allowed. Can only create streaming endpoints for text-generation-inference, vLLM and LightLLM." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we probably want a shared constant string or helper function to use in both error sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had some nits re the error messages we're returning, other than that looks good
""" | ||
Updates an LLM endpoint for the current user. | ||
""" | ||
logger.info(f"POST /llm/model-endpoints/{model_endpoint_name} with {request} for {auth}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: PUT
except (ObjectNotFoundException, ObjectNotAuthorizedException) as exc: | ||
raise HTTPException( | ||
status_code=404, | ||
detail="The specified model bundle could not be found.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: think error should be "specified llm endpoint could not be found", since having a model bundle under the hood is an implementation detail
status_code=400, | ||
detail=str(exc), | ||
) from exc | ||
except ObjectNotApprovedException as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to get to this line? if so the error message seems maybe off (don't think we know about model bundles for llm endpoints, since it's internal + an implementation detail)
@@ -115,6 +123,7 @@ async def test_create_model_endpoint_use_case_success( | |||
"inference_framework_image_tag": create_llm_model_endpoint_request_sync.inference_framework_image_tag, | |||
"num_shards": create_llm_model_endpoint_request_sync.num_shards, | |||
"quantize": None, | |||
"checkpoint_path": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we set a fake non-null checkpoint path for some of these?
Pull Request Summary
Add API route for updating LLM endpoints
Test Plan and Usage Guide
Test deployment