Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

text generation details not working when stream=False #1876

Closed
2 of 4 tasks
uyeongkim opened this issue May 10, 2024 · 3 comments
Closed
2 of 4 tasks

text generation details not working when stream=False #1876

uyeongkim opened this issue May 10, 2024 · 3 comments

Comments

@uyeongkim
Copy link

System Info

I ran docker with model-id with downloaded lamma3 model, from huggingface.
And I requested with python code below

from huggingface_hub import AsyncInferenceClient

client = AsyncInferenceClient("http://127.0.0.1:8080")


output = await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
print(output)

but It does not displays details,
TextGenerationOutput(generated_text='100% open-source and available on GitHub. It is distributed', details=None)

and server log is like

2024-05-10T09:32:15.955615Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("4-nvidia-rtx-a6000"))}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(12), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="1.425314571s" validation_time="477.908µs" queue_time="66.966µs" inference_time="1.42476984s" time_per_token="118.73082ms" seed="None"}: text_generation_router::server: router/src/server.rs:309: Success

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

from huggingface_hub import AsyncInferenceClient

client = AsyncInferenceClient("http://127.0.0.1:8080")


output = await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
print(output)
2024-05-10T09:32:15.955615Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("4-nvidia-rtx-a6000"))}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(12), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="1.425314571s" validation_time="477.908µs" queue_time="66.966µs" inference_time="1.42476984s" time_per_token="118.73082ms" seed="None"}: text_generation_router::server: router/src/server.rs:309: Success

Expected behavior

text generate should give details instead of None

@fxmarty
Copy link
Contributor

fxmarty commented May 14, 2024

@uyeongkim I opened a similar issue at: huggingface/huggingface_hub#2281

Related issue for stream=True: #1530

Since you use stream=False, using simply requests instead of huggingface_hub should work for you:

import requests

session = requests.Session()


# url = "http://0.0.0.0:80/generate_stream"
url = "http://0.0.0.0:80/generate"
data = {"inputs": "Today I am in Paris and", "parameters": {"max_new_tokens": 20}}
headers = {"Content-Type": "application/json"}

response = requests.post(url, json=data, headers=headers)

response = session.post(
    url,
    json=data,
    headers=headers,
    stream=False, # True,
)

# for line in response.iter_lines():
#     print(f"line: `{line}`")

print(response.headers)

@kdamaszk
Copy link

It looks like this is a regression in huggingface_hub package, because it doesn't reproduce on older versions, like 0.20.0

@Wauplin
Copy link
Contributor

Wauplin commented Jun 11, 2024

@uyeongkim @kdamaszk This was indeed a regression. A hot-fix release has been shipped: https://github.com/huggingface/huggingface_hub/releases/tag/v0.23.3. See related PR for more details: huggingface/huggingface_hub#2316.

Note: this was not a bug in text-generation-inference itself.

@Wauplin Wauplin closed this as completed Jun 11, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants