Improved stability of litellm models for reasoning models. #538

JoelNiklaus · 2025-02-05T14:47:00Z

No description provided.

satpalsr · 2025-02-07T11:55:01Z

Hey @JoelNiklaus, since you have made most commits around LiteLLM.
I have few questions/requests

Why not expose entire LiteLLM config as parameter or something?
I'm actually looking to run open-r1 evals using for instance models deployed at deepinfra with temp=0. Is it possible right now?

JoelNiklaus · 2025-02-07T14:49:24Z

Hi @satpalsr,

When I integrated litellm, I made it very similar to the existing openai_model. But I agree with you that we may want to configure more. I am sure the maintainers are open if you want to open a PR for that :)
I am not familiar with deepinfra, but since it seems supported by litellm it should work out of the box.

satpalsr · 2025-02-07T15:34:59Z

@JoelNiklaus Thanks. I'll just drop it as separate issue for anyone to pick.
For time being, I just modified the OpenAIClient code & got my evals done.

src/lighteval/models/litellm_model.py

NathanHB · 2025-02-12T13:27:46Z

src/lighteval/models/litellm_model.py

                    kwargs["caching"] = False
                    logger.info("Response is empty, retrying without caching")
                    response = litellm.completion(**kwargs)
+
+                if content and "<think>" in content:
+                    logger.debug(f"Removing <think> tags from response: {content}")


Why are we removing think tags from the answer here ? I think it should be done in the metric function no ?

If we are evaluating a reasoning model the grader will look at the thinking tokens unless we remove them. We would need to remove them in every metric function otherwise.

yeah but in that case you lose the thinking traces in the details.
What we would need:

keep the thinking traces in the details

allow the user to choose wether or not to evaluate with thinking tags

True. Maybe we can open an issue for that and add that improvement in a later PR?

Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

Improved stability of litellm models for reasoning models.

923f036

NathanHB reviewed Feb 12, 2025

View reviewed changes

src/lighteval/models/litellm_model.py Outdated Show resolved Hide resolved

NathanHB reviewed Feb 12, 2025

View reviewed changes

Update src/lighteval/models/litellm_model.py

1a468b7

Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved stability of litellm models for reasoning models. #538

Improved stability of litellm models for reasoning models. #538

JoelNiklaus commented Feb 5, 2025

satpalsr commented Feb 7, 2025

JoelNiklaus commented Feb 7, 2025

satpalsr commented Feb 7, 2025

NathanHB Feb 12, 2025

JoelNiklaus Feb 12, 2025

NathanHB Mar 4, 2025

JoelNiklaus Mar 4, 2025

Improved stability of litellm models for reasoning models. #538

Are you sure you want to change the base?

Improved stability of litellm models for reasoning models. #538

Conversation

JoelNiklaus commented Feb 5, 2025

satpalsr commented Feb 7, 2025

JoelNiklaus commented Feb 7, 2025

satpalsr commented Feb 7, 2025

NathanHB Feb 12, 2025

Choose a reason for hiding this comment

JoelNiklaus Feb 12, 2025

Choose a reason for hiding this comment

NathanHB Mar 4, 2025

Choose a reason for hiding this comment

JoelNiklaus Mar 4, 2025

Choose a reason for hiding this comment