Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

adjust atol value for mini_qwen2_vl FP32 in test_mini_model_multimodal #563

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

faaany
Copy link
Collaborator

@faaany faaany commented Feb 12, 2025

Summary

test_mini_model_multimodal has 2 test models: mini_mllama and mini_qwen2_vl. On XPU, mini_mllama cases pass but mini_qwen2_vl cases fail. In the 2 failed cases: one fails at 1e-2 decimal point, which we need to fix; but the other fails at 1e-6 decimal point, which we suspect that the atol is set too strict for non-cuda hardware accelerators as can be seen below:

FAILED test_mini_models_multimodal.py::test_mini_model_multimodal[mini_qwen2_vl-32-0.0001-dtype0-1e-08-1e-05-0.005-1e-05-0.005-1e-05] - AssertionError: Number of mismatched elements: 3
Mismatch at index (0, 21): tensor1[(0, 21)] = 0.03207247704267502, tensor2[(0, 21)] = 0.0320717953145504
Mismatch at index (0, 24): tensor1[(0, 24)] = 0.03104054555296898, tensor2[(0, 24)] = 0.03104141354560852
Mismatch at index (0, 27): tensor1[(0, 27)] = 0.04820457845926285, tensor2[(0, 27)] = 0.04820366948843002

This PR tries to fix this. What do you guys think?

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@faaany faaany requested a review from tyler-romero February 12, 2025 09:12
@faaany
Copy link
Collaborator Author

faaany commented Feb 12, 2025

cc @yao-matrix and @sywangyi

@tyler-romero
Copy link
Collaborator

This LGTM but I'm going to pass to someone at LI to provide a final review: @shivam15s

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants