Skip to content

Potentially optimize dot4{I,U}8Packed on Metal #7653

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

robamler
Copy link
Contributor

@robamler robamler commented May 1, 2025

Connections

Description
On Metal >= 2.1, emit code for dot4I8Packed and dot4U8Packed that might be easier to optimize for the compiler.

Testing

  • Includes snapshot tests for both Metal < 2.1 (no change) and >= 2.1 (new code gets emitted).

Squash or Rebase?

Needs squashing.

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

This might allow the Metal compiler to emit faster code (but that's not
confirmed). See
<gpuweb/gpuweb#2677 (comment)>
for the optimization. The limitation to Metal 2.1+ is discussed here:
<gfx-rs#7574 (comment)>.
@robamler robamler force-pushed the packed-vector-format-metal branch from a6825c8 to 8813b38 Compare May 1, 2025 10:53
@robamler

This comment was marked as resolved.

CI on test failed because the latest changes to `put_block` made its
stack too big. Factoring out the new code into a separate method fixes
this issue.
@robamler robamler force-pushed the packed-vector-format-metal branch from be668ce to 2a41b95 Compare May 1, 2025 22:12
@robamler
Copy link
Contributor Author

robamler commented May 1, 2025

I fixed the issue of an excessive stack size of put_block by factoring out the new code into a separate method. This PR will now need squashing before being merged.

The previously failing CI test can be run on a non-mac machine as follows:

cd naga
cargo test --features msl-out test_stack_size

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant