-
Notifications
You must be signed in to change notification settings - Fork 100
[XLA:GPU] Use DeviceDescription instead of hard-coding warp size as 32 #2938
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: r2.18-rocm-enhanced
Are you sure you want to change the base?
Conversation
tensorflow/tf-build-actions@600513b [ROCm] Fix flaky gpu compiler test when building with rocm tensorflow/tf-build-actions@a35cf48 [XLA:GPU] Use DeviceDescription instead of hard-coding warp size as 32 xla@e849446 [ROCm] Pass correct warp size to Triton pipeline xla@3e7b0fe cherry-picked warp size passing to triton calls, and globally enabled warpsize=64 xla@750ad89 Fixes.
c76562c
to
5e84717
Compare
PR is very large; it changes 90 files. |
This is the original commit tensorflow@a35cf48 which has 75 files modified. Yeah, definitely this is a quite large one for us already. And since the latest XLA is quite different from the one in tensorflow r2.18, the number of files modified increased during backporting. Maybe I can split pure original commit (with conflicts) and modifications made during backporting (to solve conflicts and make it compile), but this will still leave us a commit contains at least 75 files modified plus a relatively smaller patch. Do you think this would help this case? |
Probably not too much, will try to review it as it is. |
That would be tough work, sorry |
Is it possible to get that second commit (fixing errors) somewhere? |
Just looked through local branches but had no fortune. But no worries, I believe it won't take too long to do the rework. Let me do it, otherwise it might not be well organized for you to review. |
Done. Please help review under this reorganized PR: #2962 |
We should query the hardware to discover its warp size.
PiperOrigin-RevId: 700787004