Fix Qwen3 Embedding Float16 DType #663

tpendragon · 2025-06-27T20:59:26Z

What does this PR do?

Without this, batches were erroring on CPU because it was trying to compare the F32 attention mask with the F16 output tensors.

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

tpendragon added 2 commits June 27, 2025 12:25

Convert attention_bias to the right dtype.

912b370

Fix min value.

3e32246