-
Notifications
You must be signed in to change notification settings - Fork 249
Very slow 4x4 convolutions on gfx803 #134
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Which version of miopen do you use? |
@huanzhang12 AFAICS you are using 2.2.0. Version 2.3.0 is just released. It includes c58488b that should restore gfx8 performance. Please close this if the issue is resolved. |
@atamazov I tried the just released version 2.3.0 and it is amazing! It is great news that ASM kernels are re-enabled on gfx803. The same 4x4 convolution runs at 1684 GFLOPs:
My workload involving some 4x4 convolutions runs 10 times faster on v2.3.0. Thank you so much for the hard work and I am closing this issue. |
Since the ASM kernels were disabled on gfx803 in commit ce51a4c, 4x4 convolutions on gfx803 default to the very slow gemm algorithm:
Before ASM kernels were disabled, it was much faster:
The performance reduces from 694 GFLOPs to 15 GFLOPs.
I am wondering why all ASM kernels were disabled for gfx803 instead of disabling individual problematic ones?
Also, even without an ASM implementation, can we use a general OpenCL implementation in this case rather than rely on the extremely slow GEMM? (It seems
conv_ocl_dir2Dfwd.cpp
is not enabled for most 4x4 convolutions)The text was updated successfully, but these errors were encountered: