Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fixing reproducibility of lmeval tests #1220

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

brian-dellabetta
Copy link
Collaborator

@brian-dellabetta brian-dellabetta commented Mar 4, 2025

SUMMARY:
LM Eval weekly tests are failing, this resolves two issues

  1. installs pillow, which I had locally through vllm but is not installed as part of llm-compressor
  2. adds a random seed to the lmeval tests, which seems after a good amount of testing to resolve the issue. it is entirely during calibration/quantization, lm-eval behavior is deterministic as they always set a seed. It is a bit surprising that it can have such a drastic effect, but these are 2B vision-language models and a difficult multiple choice dataset, not too far away from random guessing.

TEST PLAN:
no new src code

Copy link

github-actions bot commented Mar 4, 2025

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

dsikka
dsikka previously approved these changes Mar 4, 2025
@dsikka dsikka marked this pull request as ready for review March 4, 2025 01:25
@dsikka dsikka added the ready When a PR is ready for review label Mar 4, 2025
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/lmeval-test-bugfixes branch from 87dcc4c to 1f2ce00 Compare March 5, 2025 22:40
@brian-dellabetta brian-dellabetta requested a review from dsikka March 5, 2025 22:40
@brian-dellabetta brian-dellabetta changed the title attempts to fix lmeval failing tests fixing reprdocubility of lmeval tests Mar 5, 2025
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/lmeval-test-bugfixes branch from 3429cc7 to b91cd7b Compare March 7, 2025 20:00
horheynm
horheynm previously approved these changes Mar 7, 2025
rahul-tuli
rahul-tuli previously approved these changes Mar 7, 2025
Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GG!

@dsikka dsikka enabled auto-merge (squash) March 7, 2025 20:10
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase?

@brian-dellabetta brian-dellabetta changed the title fixing reprdocubility of lmeval tests fixing reproducibility of lmeval tests Mar 7, 2025
@brian-dellabetta brian-dellabetta requested a review from dsikka March 7, 2025 20:25
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@brian-dellabetta brian-dellabetta dismissed stale reviews from rahul-tuli and horheynm via 2d2e220 March 7, 2025 20:27
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/lmeval-test-bugfixes branch from b91cd7b to 2d2e220 Compare March 7, 2025 20:27
"compressed-tensors"
if version_info.build_type == "release"
else "compressed-tensors-nightly",
"pillow",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a dev dependency rather than a base dependency?

@@ -2,6 +2,7 @@ cadence: weekly
model: Qwen/Qwen2-VL-2B-Instruct
model_class: TraceableQwen2VLForConditionalGeneration
scheme: FP8_DYNAMIC
seed: 42 #compressed model is sensitive to random seed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
seed: 42 #compressed model is sensitive to random seed
seed: 42 # compressed model is sensitive to random seed

@@ -73,6 +75,12 @@ def set_up(self):
self.quant_type = eval_config.get("quant_type")
self.save_dir = eval_config.get("save_dir")

seed = eval_config.get("seed", None)
if seed is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants