-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: improve gguf performance with torch.compile #8031
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all we do w/ this constant is check if something is in
it - no functional reason it needs to be in a set.
d550559
to
59b7b35
Compare
pytorch 2.7 does not implement `set.__contains__`, so make this a list instead. See pytorch/pytorch#145761
59b7b35
to
4d40b32
Compare
@keturn hi I'm curious how |
@StrongerXi It's not used anywhere in core at the moment—this issue came up when I tried applying it in an extension for the Chroma model. That did see some appreciable performance gains, so I tried to take advantage of that experience and do the same thing with Invoke's FLUX model, but that's been a fraught experience. I've been unable to explain why that gives me so much more trouble in compilation and so much less to gain for it, considering Chroma and FLUX are very nearly the same model. (In fact, since I tried that with FLUX, Chroma's compilation only succeeds on its second invocation, failing with something about UserDefinedObjectVariable not having "proxy" the first time around. It was compiling okay without that issue a week ago and I'm at a loss as to what changed.) |
That sounds very similar to what we fixed in PyTorch nightly a few months ago, can you give nightly a try? This post has all the context. |
Unfortunately nightly (2.8.0.dev20250605+cu128) does not seem to improve things. That category of error ("proxy") is still there, but unlike stable 2.7.1, running it a second time fails with a different error (in |
@keturn thanks, if you can provide a repro and/or a full error message running with |
I'm a long way from a minimal reproduction but I dumped some logs in pytorch/pytorch#155266 |
Summary
When using torch.compile with a Flux-type GGUF model, tlparse reports this error:
This is in
get_dequantized_tensor
, which gets called frequently enough for this to have a significant influence.Changing the collection from a set to a list is sufficient to make it compatible.
Related Issues / Discussions
See pytorch/pytorch#145761
QA Instructions
Run a GGUF.
Merge Plan
Checklist
What's New
copy (if doing a release after this PR)