Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

core/util: Fix missing rocr dlopen function assignments #10812

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

swelch
Copy link
Contributor

@swelch swelch commented Feb 21, 2025

Resolves issue #10797

@j-xiong
Copy link
Contributor

j-xiong commented Feb 21, 2025

Please change the prefix of the commit title from core/util: to core/hmem or just hmem.

@j-xiong
Copy link
Contributor

j-xiong commented Feb 21, 2025

Also please add the description of the issue (instead of the issue #) to the commit message. Leave the issue # in the PR description.

@swelch
Copy link
Contributor Author

swelch commented Feb 21, 2025

Please change the prefix of the commit title from core/util: to core/hmem or just hmem.

Changed to core/hmem, makes sense; I had just followed the label you added to the issue.

@j-xiong
Copy link
Contributor

j-xiong commented Feb 21, 2025

Thanks for the update. The issue label just puts core and util into the same category which is different from the commit title convention.

The updated commit message has the title line repeated in the body. That is not necessary. What I meant was to use the following description from the issue:

When libfabric is configured with "--with-rocr=$ROCM_PATH --enable-rocr-dlopen",
three functions ("hsa_ops.hsa_signal_store_screlease", "hsa_ops.hsa_signal_load_scacquire", 
"hsa_ops.hsa_amd_agents_allow_access") are not mapped to their associated symbols 
from "libhsa-runtime64.so" in the initialization routine "rocr_hmem_dl_init()". This can 
lead to a segmentation violation and application crash.

@swelch
Copy link
Contributor Author

swelch commented Feb 21, 2025

Thanks for the update. The issue label just puts core and util into the same category which is different from the commit title convention.

The updated commit message has the title line repeated in the body. That is not necessary. What I meant was to use the following description from the issue:

When libfabric is configured with "--with-rocr=$ROCM_PATH --enable-rocr-dlopen",
three functions ("hsa_ops.hsa_signal_store_screlease", "hsa_ops.hsa_signal_load_scacquire", 
"hsa_ops.hsa_amd_agents_allow_access") are not mapped to their associated symbols 
from "libhsa-runtime64.so" in the initialization routine "rocr_hmem_dl_init()". This can 
lead to a segmentation violation and application crash.

IMO this is obvious from the patch and adds no real value to the 1-line; but I'll add the requested text.

When libfabric is configured with "--with-rocr=$ROCM_PATH --enable-rocr-dlopen",
three functions ("hsa_ops.hsa_signal_store_screlease",
"hsa_ops.hsa_signal_load_scacquire", "hsa_ops.hsa_amd_agents_allow_access") are
not mapped to their associated symbols from "libhsa-runtime64.so" in the
initialization routine "rocr_hmem_dl_init()". This can lead to a segmentation
violation and application crash.

Resolves issue ofiwg#10797

Signed-off-by: Michael Lough <michael.lough@hpe.com>
Signed-off-by: Steve Welch <welch@hpe.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants