display available cached versions in TGI server error message of Neuron backend #3063

jimburtoft · 2025-02-26T23:49:02Z

Pulling from huggingface/optimum-neuron#776

If a model is cached with a different configuration, I want to display alternative options to the user.

If someone copies from the deploy code on Hugging Face and changes something (e.g. sequence length), it is not obvious why it isn't working from this code. (especially if they don't understand compiling because they are referencing the original model)

Based on a true story!

added some carriage returns to make it more readable

get_hub_cached_entries does generate an error if it is fed a model that doesn't have a model_type. For example: (randomly selected) model_id = "hexgrad/Kokoro-82M"

Traceback (most recent call last):
File "", line 1, in
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/utils/hub_cache_utils.py", line 431, in get_hub_cached_entries
model_type = target_entry.config["model_type"]
KeyError: 'model_type'

However, we already call that function inside of is_cached at the top of this block, so I don't know if we are filtering for certain types of models before we get to this point or not. If not, the existing code would generate that error before it ever gets here.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x ] Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Pulling from huggingface/optimum-neuron#776

dacorvo · 2025-02-27T14:59:42Z

backends/neuron/server/text_generation_server/model.py

@@ -107,10 +107,26 @@ def fetch_model(
    if not is_cached(model_id, neuron_config):
        hub_cache_url = "https://huggingface.co/aws-neuron/optimum-neuron-cache"
        neuron_export_url = "https://huggingface.co/docs/optimum-neuron/main/en/guides/export_model#exporting-neuron-models-using-neuronx-tgi"
+        entries = get_hub_cached_entries(model_id, "inference")


This method is already called by is_cached: I'd rather avoid having two consecutive calls to the hub.
The is_cached method is never called anywhere else, so maybe you can change its signature to something like has_compatible_entry(neuron_config, entries). That way you can first fetch the entries, check if one is compatible, and otherwise just loop over incompatible entries just like you do.

dacorvo · 2025-02-27T15:11:27Z

backends/neuron/server/text_generation_server/model.py

+                config_list.append(config)
+            available_configs = "\nAvailable cached configurations for this model:\n- " + "\n- ".join(config_list)
+        else:
+            available_configs = "\nNo cached versions are currently available for that model with any configuration."


It looks quite redundant with the first line of the error message. Do we really need to say something more specific here ?

Update model.py

1cb904e

Pulling from huggingface/optimum-neuron#776

jimburtoft changed the title ~~display available cached versions in TGI server error message~~ display available cached versions in TGI server error message of Neuron backend Feb 26, 2025

jimburtoft mentioned this pull request Feb 27, 2025

display available cached versions in TGI server error message huggingface/optimum-neuron#776

Closed

3 tasks

dacorvo requested changes Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

display available cached versions in TGI server error message of Neuron backend #3063

display available cached versions in TGI server error message of Neuron backend #3063

jimburtoft commented Feb 26, 2025

dacorvo Feb 27, 2025

dacorvo Feb 27, 2025

display available cached versions in TGI server error message of Neuron backend #3063

Are you sure you want to change the base?

display available cached versions in TGI server error message of Neuron backend #3063

Conversation

jimburtoft commented Feb 26, 2025

Before submitting

Who can review?

dacorvo Feb 27, 2025

Choose a reason for hiding this comment

dacorvo Feb 27, 2025

Choose a reason for hiding this comment