Skip to content

LLM update API route #387

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 6 commits into from
Nov 27, 2023
Merged

LLM update API route #387

merged 6 commits into from
Nov 27, 2023

Conversation

squeakymouse
Copy link
Contributor

Pull Request Summary

Add API route for updating LLM endpoints

Test Plan and Usage Guide

Test deployment

@squeakymouse squeakymouse requested a review from a team November 20, 2023 23:57
Copy link
Contributor

@seanshi-scale seanshi-scale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we test whether this would result in zero-downtime deployments of models?

@squeakymouse
Copy link
Contributor Author

Yep, for a test deployment, I was able to change the model weights for an endpoint with no downtime 🙂 Although I'm not sure if there's a way to tell from our API at what point the changes take effect? 🤔

raise ObjectNotFoundException
if not self.authz_module.check_access_read_owned_entity(
user, model_endpoint.record
) and not self.authz_module.check_endpoint_public_inference_for_user(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to make sure only the owner can run update_endpoint, e.g. I think this needs to be check_access_write_owned_entity and no check_endpoint_public_inference_for_user?

LLMInferenceFramework.LIGHTLLM,
LLMInferenceFramework.TENSORRT_LLM,
]:
if endpoint_record.endpoint_type != ModelEndpointType.STREAMING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be request.endpoint_type?

]:
if endpoint_record.endpoint_type != ModelEndpointType.STREAMING:
raise ObjectHasInvalidValueException(
f"Creating endpoint type {str(endpoint_record.endpoint_type)} is not allowed. Can only create streaming endpoints for text-generation-inference, vLLM and LightLLM."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should we mention tensorrtllm in the error as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we probably want a shared constant string or helper function to use in both error sites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually gonna remove this check since we're not allowing changing either endpoint type or inference framework 😅

fake_llm_artifact_gateway,
fake_llm_model_endpoint_service,
create_llm_model_endpoint_request_streaming: CreateLLMModelEndpointV1Request,
update_llm_model_endpoint_request: UpdateLLMModelEndpointV1Request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we test some more cases such as changing resource requests/cases where we expect the use case to return an error?

@@ -234,6 +241,68 @@ async def get_model_endpoint(
) from exc


@llm_router_v1.post(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a PUT.

# General endpoint fields
metadata: Optional[Dict[str, Any]]
post_inference_hooks: Optional[List[str]]
endpoint_type: Optional[ModelEndpointType]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In vanilla Launch, we've typically prevented updating to change endpoint types. The general point is that not all of the "create" fields are necessarily relevant or allowable for updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok below, you do throw a runtime error to prevent this. Makes me wonder why in Launch we even allowed endpoint_type in the request payload for updates 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah oops, UpdateModelEndpointV1Request does not include endpoint_type 😅 Will remove endpoint_type and inference_framework from UpdateLLMModelEndpointV1Request!



class UpdateLLMModelEndpointV1Response(BaseModel):
endpoint_creation_task_id: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this endpoint_update_task_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-LLM specific UpdateModelEndpointV1Response has endpoint_creation_task_id... do we want to keep these consistent? 🤔

]:
if endpoint_record.endpoint_type != ModelEndpointType.STREAMING:
raise ObjectHasInvalidValueException(
f"Creating endpoint type {str(endpoint_record.endpoint_type)} is not allowed. Can only create streaming endpoints for text-generation-inference, vLLM and LightLLM."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we probably want a shared constant string or helper function to use in both error sites.

Copy link
Contributor

@seanshi-scale seanshi-scale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had some nits re the error messages we're returning, other than that looks good

"""
Updates an LLM endpoint for the current user.
"""
logger.info(f"POST /llm/model-endpoints/{model_endpoint_name} with {request} for {auth}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PUT

except (ObjectNotFoundException, ObjectNotAuthorizedException) as exc:
raise HTTPException(
status_code=404,
detail="The specified model bundle could not be found.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: think error should be "specified llm endpoint could not be found", since having a model bundle under the hood is an implementation detail

status_code=400,
detail=str(exc),
) from exc
except ObjectNotApprovedException as exc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to get to this line? if so the error message seems maybe off (don't think we know about model bundles for llm endpoints, since it's internal + an implementation detail)

@@ -115,6 +123,7 @@ async def test_create_model_endpoint_use_case_success(
"inference_framework_image_tag": create_llm_model_endpoint_request_sync.inference_framework_image_tag,
"num_shards": create_llm_model_endpoint_request_sync.num_shards,
"quantize": None,
"checkpoint_path": None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we set a fake non-null checkpoint path for some of these?

@squeakymouse squeakymouse merged commit 3c0f168 into main Nov 27, 2023
@squeakymouse squeakymouse deleted the katiewu/llm-update-endpoint branch November 27, 2023 18:06
@yunfeng-scale yunfeng-scale mentioned this pull request Mar 6, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants