-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!") #1106
Comments
I test rocm-3.7.0 on ubuntu-20.04, my gpu is gfx803. |
Hmm, I'm afraid I don't understand enough to know how to use your information :/ |
Same problem, different GPU and not in docker, but ArchLinux.
@xuhuisheng: How did you get the list of files tensorflow-rocm loaded? I tried It would seem I don't have
|
I find it strange that your python output doesn't list a device. Does By the way, when I experimented with tensorflow in docker, I used something like:
|
I compiled HIP from source rocm-3.7.0 and add some logs for debug. You can find the hip_code_object.cpp from HIP/rocclr/ directory. The code_object function should be a new feature from rocm-3.7.0, I am investigating a bug for gfx803 on rocm-3.7.0, rocblas seems to be the key, So I am reading the code around.
|
Okay, I now have those files as well. That pull rocm-arch/rocm-arch#413 fixed it.
Problem still persists, though. |
Please note that in the aforementioned docker container tensorflow-rocm seems to find all it needs. So this must be something ArchLinux related in my case.
|
It would seem librocrand is to blame on Arch. It is missing support for my GPU. I hacked in debug info as well and a dump of the call stack:
Will report back once I know more. |
Yes, that did the trick. Works for me now, thanks :) |
Hey @oleid to which trick are you referring to? I've submitted a PR to rocm-arch which adds |
@oleid Hm, I think you are onto something. I used both the official docker run command and your version and inside the container I get the following
whereas on my host (Ubunut 20.04) it seem to work properly:
However, on my host I still get the same issue when I try to run tensorflow operations:
TF version:
EDIT I also ran the following on host & inside container, got the same output:
|
And I cannot find how to generate the Tensile image for gfx1010 under rocBLAS. Maybe you could recompile rocBLAS with BUILD_TENSILE_HOST=false. It will skip the Tensile image. Actually the rocm didnot support gfx1010(nav10) offcially, so I cannot guarentee we could run gfx1010 on ROCm, eventually, please refer these issues: |
@xuhuisheng I solved the lsmod problem however the issue still remained. Thanks for the hint and links. I will look into it. Before I started to get TF running with the 5700xt I found some other github issue where they linked to this blog post https://www.preining.info/blog/2020/05/switching-from-nvidia-to-amd-including-tensorflow/ and confirmed it would work. So it seems some people get it running with the 5700xt. I already tried to reproduce the steps there but I wasn't successful. Also tried this approach here ROCm/ROCm#887 (comment) and wasn't able to reproduce it either. |
@reinka I am afraid we had read this blog already, unfortrunately, the auther claimed that he met a segment fault later in the comment. |
Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs? |
It would seem that GPU is not fully supported, yet. I'd expect more to come in the next versions (before CNDA is released). |
I would appreciate a flag that allows me to use what works even if not
everything and not tested instead of not being able to do anything at all
on new GPUs.
…On Fri, Oct 2, 2020 at 11:56 PM oleid ***@***.***> wrote:
Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the
list of supported GPUs?
It would seem that GPU is not fully supported, yet. I'd expect more to
come in the next versions (before CNDA is released).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1106 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMPIAOOOFXVRPFMCZPABIDSI3DLRANCNFSM4Q434RNQ>
.
|
@o8ruza8o which version of rocm do you use?By rigtorps reseaching, need rocm-3.7 to support gfx10xx. gfx1012 is more complex, tensile only support gfx1010 and gfx1011, you may have to copy related Kernel.koso too. And I had two ideas for it. |
I am running rocm 3.8.0. My kernel is 5.7.19. My GPU is gfx1012.
…On Sat, Oct 3, 2020 at 4:21 PM Xu Huisheng ***@***.***> wrote:
@o8ruza8o <https://github.com/o8ruza8o> which version of rocm do you use?
since rigtorp reseaching, need rocm-3.7 to support gfx10xx.
And Ihad two ideas for it.
first is copy /opt/rocm/lib/TensileLibrary_gfx900.co to
TensileLibrary_gfx1012.co
second is rebuild rocBLAS with BUILD_TENSILE_HOST=FALSE
please refer this issue
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1106 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMPIAJ4FNPWX5CA5NZ3CQ3SI6WW5ANCNFSM4Q434RNQ>
.
|
I have 5700xt I tried every possible method mentioned to get over this issue, nothing helped.
|
There is a new branch for gfx10 on rocBLAS, seems will release with ROCm-3.10, Maybe later of November. |
I'm curious whether the And also in other rocm packages, e.g.: |
@da-phil |
I wonder why the new RDNA2 is even categorized within gfx10, there must be some similarities in the way they work 🤔 Off-topic question: do you or anybody else know any other recent AMD radeon GPU other than gfx803, gfx900, gfx906 and gfx908 which proved to work well with rocm and therefore tensorflow & pytorch? |
I am also having the same problem. Has anyone find any solution? |
@iamsanjaymalakar please see this issue ROCm/ROCm#1269 |
I am not sure I understood the solution correctly. |
@iamsanjaymalakar |
I am currently at the same point. Ubuntu 18.04 No idea, how to use the workaround. |
@Doev |
I am getting the similar error. I have checked the AMDGPU_TARGETS for same library i.e. rocSPARSE and it correctly mentions the GPU I have which is gfx906. |
navi 10, or gfx10 chips are not officially supported by ROCm, here. There is nothing we can do without ROCm support. |
Is there any idea how long it will take for support to come? |
@RobertKillick That would be a question to ROCm guys. Once they have the infrastructure ready, it is trivial to add TF support for it. |
Has anyone had any luck getting tensorflow-rocm running on a gfx1030 device? UPDATE: I was able to get things running on a gfx1030 device building tf from source, I couldn't get available binaries to run. |
GPU: 5700xt
When using the following Docker image:
with ROCm installed on the Docker host as explained here: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
I get the following error when executing TensorFlow ops:
and the Python console dies. I started the container with the alias mentioned in the corresponding Docker registry: https://hub.docker.com/r/rocm/tensorflow
I get the same error when I try to run tensorflow ops on the host.
Googling this issue yields only a handful of results so I feel like I might have some misconfiguration but I cannot figure out what it is.
The text was updated successfully, but these errors were encountered: