-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Partitioned communications hang #12969
Comments
The inline patch below fixes the issue, I will review and issue a proper PR sometimes early this week diff --git a/ompi/mca/part/persist/part_persist.h b/ompi/mca/part/persist/part_persist.h
index eea447c..07043b0 100644
--- a/ompi/mca/part/persist/part_persist.h
+++ b/ompi/mca/part/persist/part_persist.h
@@ -485,9 +485,8 @@ mca_part_persist_start(size_t count, ompi_request_t** requests)
{
int err = OMPI_SUCCESS;
size_t _count = count;
- size_t i;
- for(i = 0; i < _count && OMPI_SUCCESS == err; i++) {
+ for(size_t i = 0; i < _count && OMPI_SUCCESS == err; i++) {
mca_part_persist_request_t *req = (mca_part_persist_request_t *)(requests[i]);
/* First use is a special case, to support lazy initialization */
if(false == req->first_send)
@@ -503,7 +502,7 @@ mca_part_persist_start(size_t count, ompi_request_t** requests)
} else {
if(MCA_PART_PERSIST_REQUEST_PSEND == req->req_type) {
req->done_count = 0;
- for(i = 0; i < req->real_parts && OMPI_SUCCESS == err; i++) {
+ for(size_t i = 0; i < req->real_parts && OMPI_SUCCESS == err; i++) {
req->flags[i] = -1;
}
} else { @mdosanjh I noted there is a lot of code in a header file, and this is not friendly for debuggers. |
Hey @ggouaillardet - I can't see the entire code, but does that create a shadow variable? Wouldn't it be wise to rename that second counter to something other than "i"? |
Sure, we can also do that! |
use a separate loop index for the innermost loop. Fixes open-mpi#12969 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
use a separate loop index for the innermost loop. Fixes open-mpi#12969 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Thank you for creating the issue and proposing a fix so quickly! |
@Jonashar can you please tell me your full name so I can update the commit message and properly credit you? |
Oh sorry, of course, my name is Jonas Harlacher. |
use a separate loop index for the innermost loop. Thanks Jonas Harlacher for bringing this to our attention. Fixes open-mpi#12969 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This issue was initially reported on Stack Overflow at https://stackoverflow.com/questions/79258925/openmpi5-partitioned-communication-with-multiple-neighbors
With the
main
branch, the following program running on 3 MPI tasks hang:The text was updated successfully, but these errors were encountered: