-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
README.md: trivial change to force mpi4py testing #4
base: main
Are you sure you want to change the base?
Conversation
Many thanks to Lisandro Dalcin for 99% of this script (inspired from https://raw.githubusercontent.com/mpi4py/mpi4py-testing/master/.github/workflows/openmpi.yml). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Hey @dalcinl -- I'm testing your mpi4py scripts. Per this PR (on my personal repo, where I'm testing -- this won't be the real/final PR onto the main Open MPI repo), I think it would be great to run these tests on every PR. I'm getting a failure on some MPI_Comm_spawn tests, which may well be expected. Is there a way to specify to mpi4py that it's ok to fail some tests? |
@jsquyres The problem with failures is that sometimes they lead to deadlock or abort. If spawn is giving problems, then you can pass |
@jsquyres You do not really need to commit to use GitHub Actions, note the |
Yeah, I've been playing with GHAs for a few different projects for the last month or two now. If we can get a stable set, I'd be happy to include these tests to be run for every PR. Is there a stable release tag I should use for mpi4py, or is master HEAD considered to be stable? |
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
For the time being I recommend to use master HEAD. I consider master HEAD very stable, the CI infrastructure is large and covers Linux/macOS MPICH/Open MPI and Windows/MSMPI. However, before you go ahead with this PR, perhaps you should wait until I finish mine? This way we push any required changes/fixes in both mpi4py/mpi4py and open-mpi/mpi to a point where everything is OK. |
Works for me. This is just a testing PR on my fork. |
Jeff, a quick question. I'm configuring with With |
Sorry for the delay; it took a little time to go reproduce this. There are two issues here:
|
About my second comment, I was not clear enough. I'm not complaining that See below, note the zero size in the last line: In [1]: from mpi4py import MPI
In [2]: t = MPI.Datatype.Create_f90_integer(9)
In [3]: t.size, t.name
Out[3]: (4, 'COMBINER MPI_INT')
In [4]: t = MPI.Datatype.Create_f90_real(9, MPI.UNDEFINED)
In [5]: t.size, t.name
Out[5]: (8, 'COMBINER MPI_DOUBLE')
In [6]: t = MPI.Datatype.Create_f90_complex(9, MPI.UNDEFINED)
In [7]: t.size, t.name
Out[7]: (0, 'COMBINER MPI_DOUBLE_COMPLEX') |
Any idea what's going on with Spawn? I thought it could be related to oversubcription, but I enabled it this way: |
Spawn: ugh. Frankly, our COMM_SPAWN support has never been rock solid. That being said, v5.0 just had a pretty major overhaul in its runtime system. As I'm typing, Ralph is closing our a lot of Open MPI runtime bugs as "This has been verified fixed for v5.0..." We're (extremely) unlikely to backport most of those fixes to existing release branches. As for MPI_Type_create_f90_complex -- I tracked this down to the C code and talked to @bosilca about it. You are correct: we are returning a valid datatype (i.e., not MPI_DATATYPE_NULL) that has an extent of 0. The MPI spec does not mandate that MPI_DATATYPE_NULL should be returned if there is no compatible datatype. Indeed, MPI-3.1 says that it is erroneous to make such a request -- but I recognize that this is a bit of a chicken-and-egg situation. All that boils down to: at least as of now, this is expected Open MPI behavior. |
Jeff, about my Spawn question, and the test failures I linked, let me clarify that the issues are with OMPI master. I understand you do not have the time to backport all the fixes, and I'm not making a request for it. I'm just looking forward to a state where all mpi4py tests pass cleanly with OMPI/master About the MPI_Type_create_f90_complex, sorry for being so insistent, but I have not managed to make my point clear, and then you still do not see the inconsistencies in the implementation of the f90 integer/real/complex trio. Please look again at the implementation, and compare with the code for f90 integer/real/complex. There is an inconsistency in complex! And that inconsistency is what makes MPI_Type_create_f90_complex behave differently (unavailable, zero-size datatype) with building without a Fortran compiler. A few lines of code are worth hundred words. IMHO, you have to apply the following fix: diff --git a/ompi/mpi/c/type_create_f90_complex.c b/ompi/mpi/c/type_create_f90_complex.c
index 638afc2f26..cb86277a89 100644
--- a/ompi/mpi/c/type_create_f90_complex.c
+++ b/ompi/mpi/c/type_create_f90_complex.c
@@ -84,10 +84,10 @@ int MPI_Type_create_f90_complex(int p, int r, MPI_Datatype *newtype)
*/
if ( (LDBL_DIG < p) || (LDBL_MAX_10_EXP < r) || (-LDBL_MIN_10_EXP < r) ) *newtype = &ompi_mpi_datatype_null.dt;
- else if( (DBL_DIG < p) || (DBL_MAX_10_EXP < r) || (-DBL_MIN_10_EXP < r) ) *newtype = &ompi_mpi_ldblcplex.dt;
- else if( (FLT_DIG < p) || (FLT_MAX_10_EXP < r) || (-FLT_MIN_10_EXP < r) ) *newtype = &ompi_mpi_dblcplex.dt;
+ else if( (DBL_DIG < p) || (DBL_MAX_10_EXP < r) || (-DBL_MIN_10_EXP < r) ) *newtype = &ompi_mpi_c_long_double_complex.dt;
+ else if( (FLT_DIG < p) || (FLT_MAX_10_EXP < r) || (-FLT_MIN_10_EXP < r) ) *newtype = &ompi_mpi_c_double_complex.dt;
else if( ! OMPI_HAVE_FORTRAN_COMPLEX4 ||
- (sflt_dig < p) || (sflt_max_10_exp < r) || (-sflt_min_10_exp < r) ) *newtype = &ompi_mpi_cplex.dt;
+ (sflt_dig < p) || (sflt_max_10_exp < r) || (-sflt_min_10_exp < r) ) *newtype = &ompi_mpi_c_complex.dt;
else *newtype = &ompi_mpi_complex4.dt; Note that this change implements |
@dalcinl your patch assumes that the Fortran and C complex types are identical which unfortunately is not a correct assumption. I know we are abusing for real and integer by making the same assumption, but it "holds" because, unlike for complex, up to now we haven't found any compiler where this was not the case. I do not recall which compiler trigger that issue, maybe @jsquyres does. |
@dalcinl Let me know when you have a final recipe ready and I'll incorporate it into my nightly MTT testing. |
@jsquyres I'm waiting for open-mpi#8756 to provide a fix for open-mpi#8677. I'm also waiting for an already approved PR to get merged in MPICH. Then I can merge some required changes to mpi4py testsuite (mpi4py/mpi4py#37). After that, I have to reassess the Spawn situation, possibly disabling failings tests under ompi@master. |
@dalcinl I'll split out the fix for open-mpi#8677 since it isn't relevant to the main PR, and that may not go in for a few weeks. |
Getting that fix first will allow Jeff and I to move forward with this one. Many thanks! |
@dalcinl opened open-mpi#8779 to fix it. |
@jsquyres Ideally, issue open-mpi#4135 should also be addressed, see my last comment. For the time being, mpi4py is skipping reduction tests with zero-size buffers. |
@jsquyres With current master, is there a way to tell |
@jsquyres We are almost there
|
@jsquyres This is finally ready from my side. There is no point on delaying things for a minor issue left (external32 for PS: Some day we could use a build matrix, for example to test builds with UCX. I would need your help to properly setup MCA parameters such that we trigger interesting code paths within Open MPI. For example, look at the MPICH builds, I'm testing eight different configurations. |
ping? |
c2f0fa1
to
6199a14
Compare
35b28d9
to
2911d48
Compare
Signed-off-by: Jeff Squyres jsquyres@cisco.com