README.md: trivial change to force mpi4py testing #4

jsquyres · 2021-03-22T14:31:25Z

Signed-off-by: Jeff Squyres jsquyres@cisco.com

Many thanks to Lisandro Dalcin for 99% of this script (inspired from https://raw.githubusercontent.com/mpi4py/mpi4py-testing/master/.github/workflows/openmpi.yml). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

jsquyres · 2021-03-22T14:59:59Z

Hey @dalcinl -- I'm testing your mpi4py scripts. Per this PR (on my personal repo, where I'm testing -- this won't be the real/final PR onto the main Open MPI repo), I think it would be great to run these tests on every PR.

I'm getting a failure on some MPI_Comm_spawn tests, which may well be expected. ☹️

Is there a way to specify to mpi4py that it's ok to fail some tests?

dalcinl · 2021-03-22T15:13:54Z

@jsquyres The problem with failures is that sometimes they lead to deadlock or abort. If spawn is giving problems, then you can pass -e spawn -e dynproc to test/runtests.py to skip all these tests. If any of the mpi4py.futures tests is not working, just comment the full step for now.

dalcinl · 2021-03-22T15:17:08Z

@jsquyres You do not really need to commit to use GitHub Actions, note the workflow_dispatch in the YAML file. Google for the relevant docs, you can just trigger a build manually from your browser using GitHub's UI. Oh, but for this to work, you need to have the workflow YAML file already in master.

jsquyres · 2021-03-22T15:18:29Z

Yeah, I've been playing with GHAs for a few different projects for the last month or two now. If we can get a stable set, I'd be happy to include these tests to be run for every PR.

Is there a stable release tag I should use for mpi4py, or is master HEAD considered to be stable?

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

dalcinl · 2021-03-22T16:04:18Z

For the time being I recommend to use master HEAD. I consider master HEAD very stable, the CI infrastructure is large and covers Linux/macOS MPICH/Open MPI and Windows/MSMPI.

However, before you go ahead with this PR, perhaps you should wait until I finish mine? This way we push any required changes/fixes in both mpi4py/mpi4py and open-mpi/mpi to a point where everything is OK.

jsquyres · 2021-03-22T16:12:11Z

However, before you go ahead with this PR, perhaps you should wait until I finish mine? This way we push any required changes/fixes in both mpi4py/mpi4py and open-mpi/mpi to a point where everything is OK.

Works for me. This is just a testing PR on my fork.

dalcinl · 2021-03-22T17:46:55Z

Jeff, a quick question. I'm configuring with --disable-mpi-fortran. However, the Fortran compiler wrappers are still installed. Is there any way to completely disable Fortran? FYI, my only interest in this is testing, making sure that mpi4py testsuite can run with MPI implementation (or custom builds of them) lacking Fortran support.

With --disable-mpi-fortran, the calls MPI_Create_f90_{integer|real}() seem to return valid datatypes. However, the call MPI_Create_f90_complex() seems to return a non-null handle, but of size zero (MPI_Type_size() returns zero size).
Any idea whether this is intentional or just an oversight? I believe that if you can define F90 real without a Fortran compiler, so you can define F90 complex, right?

jsquyres · 2021-03-23T14:58:16Z

Sorry for the delay; it took a little time to go reproduce this. There are two issues here:

It looks like --disable-mpi-fortran does still install mpifort and friends. On the surface, this feels like a bug, if for no other reason than they're trying to -lmpi_mpifh, and that library does not exist in the installed location. Offhand, I can't think of a reason why we'd be installing mpifort (and friends) if there's no Fortran bindings. I'll file an issue about this. EDIT Filed Configuring with --disable-mpi-fortran still installs mpifort(1) open-mpi/ompi#8682.
Even with --disable-mpi-fortran, we do still return valid datatypes from MPI_Type_create_f90_*(). I need to dig into the MPI standard a bit more, but I think that this is correct behavior.

dalcinl · 2021-03-23T16:07:26Z

About my second comment, I was not clear enough.

I'm not complaining that MPI_Create_f90_{integer|real}() without a Fortran compiler, quite the opposite! My complain is that MPI_Create_f90_complex() seems to be broken, returning a datatype of size zero. If you support the first two (integer|real), I fail to see why you wouldn't support the third (complex). Looks like this is a trivial bug handling a HAVE_FORTRAN preprocessor guard.

See below, note the zero size in the last line:

In [1]: from mpi4py import MPI

In [2]: t = MPI.Datatype.Create_f90_integer(9)
In [3]: t.size, t.name
Out[3]: (4, 'COMBINER MPI_INT')

In [4]: t = MPI.Datatype.Create_f90_real(9, MPI.UNDEFINED)
In [5]: t.size, t.name
Out[5]: (8, 'COMBINER MPI_DOUBLE')

In [6]: t = MPI.Datatype.Create_f90_complex(9, MPI.UNDEFINED)
In [7]: t.size, t.name
Out[7]: (0, 'COMBINER MPI_DOUBLE_COMPLEX')

dalcinl · 2021-03-23T19:21:30Z

Any idea what's going on with Spawn?
https://github.com/mpi4py/mpi4py-testing/runs/2178162342?check_suite_focus=true

I thought it could be related to oversubcription, but I enabled it this way:
https://github.com/mpi4py/mpi4py-testing/actions/runs/680630523/workflow#L63

jsquyres · 2021-03-25T14:55:58Z

Spawn: ugh. Frankly, our COMM_SPAWN support has never been rock solid. That being said, v5.0 just had a pretty major overhaul in its runtime system. As I'm typing, Ralph is closing our a lot of Open MPI runtime bugs as "This has been verified fixed for v5.0..." We're (extremely) unlikely to backport most of those fixes to existing release branches.

As for MPI_Type_create_f90_complex -- I tracked this down to the C code and talked to @bosilca about it. You are correct: we are returning a valid datatype (i.e., not MPI_DATATYPE_NULL) that has an extent of 0. The MPI spec does not mandate that MPI_DATATYPE_NULL should be returned if there is no compatible datatype. Indeed, MPI-3.1 says that it is erroneous to make such a request -- but I recognize that this is a bit of a chicken-and-egg situation.

All that boils down to: at least as of now, this is expected Open MPI behavior.

dalcinl · 2021-03-25T17:00:16Z

Jeff, about my Spawn question, and the test failures I linked, let me clarify that the issues are with OMPI master. I understand you do not have the time to backport all the fixes, and I'm not making a request for it. I'm just looking forward to a state where all mpi4py tests pass cleanly with OMPI/master

About the MPI_Type_create_f90_complex, sorry for being so insistent, but I have not managed to make my point clear, and then you still do not see the inconsistencies in the implementation of the f90 integer/real/complex trio.

Please look again at the implementation, and compare with the code for f90 integer/real/complex. There is an inconsistency in complex! And that inconsistency is what makes MPI_Type_create_f90_complex behave differently (unavailable, zero-size datatype) with building without a Fortran compiler.

A few lines of code are worth hundred words. IMHO, you have to apply the following fix:

diff --git a/ompi/mpi/c/type_create_f90_complex.c b/ompi/mpi/c/type_create_f90_complex.c
index 638afc2f26..cb86277a89 100644
--- a/ompi/mpi/c/type_create_f90_complex.c
+++ b/ompi/mpi/c/type_create_f90_complex.c
@@ -84,10 +84,10 @@ int MPI_Type_create_f90_complex(int p, int r, MPI_Datatype *newtype)
      */
 
     if     ( (LDBL_DIG < p) || (LDBL_MAX_10_EXP < r) || (-LDBL_MIN_10_EXP < r) ) *newtype = &ompi_mpi_datatype_null.dt;
-    else if( (DBL_DIG  < p) || (DBL_MAX_10_EXP  < r) || (-DBL_MIN_10_EXP  < r) ) *newtype = &ompi_mpi_ldblcplex.dt;
-    else if( (FLT_DIG  < p) || (FLT_MAX_10_EXP  < r) || (-FLT_MIN_10_EXP  < r) ) *newtype = &ompi_mpi_dblcplex.dt;
+    else if( (DBL_DIG  < p) || (DBL_MAX_10_EXP  < r) || (-DBL_MIN_10_EXP  < r) ) *newtype = &ompi_mpi_c_long_double_complex.dt;
+    else if( (FLT_DIG  < p) || (FLT_MAX_10_EXP  < r) || (-FLT_MIN_10_EXP  < r) ) *newtype = &ompi_mpi_c_double_complex.dt;
     else if( ! OMPI_HAVE_FORTRAN_COMPLEX4 ||
-             (sflt_dig < p) || (sflt_max_10_exp < r) || (-sflt_min_10_exp < r) ) *newtype = &ompi_mpi_cplex.dt;
+             (sflt_dig < p) || (sflt_max_10_exp < r) || (-sflt_min_10_exp < r) ) *newtype = &ompi_mpi_c_complex.dt;
     else                                                                         *newtype = &ompi_mpi_complex4.dt;

Note that this change implements MPI_Type_create_f90_complex in term of C types, not Fortran types. And that's EXACTLY how things are done in MPI_Type_create_f90_{integer|real}. Do you understand my point now? With my patch, the all of the f90 integer/real/complex trio gets implemented in terms of C types. A foolish consistency is the hobgoblin of little minds.

bosilca · 2021-03-25T22:58:02Z

@dalcinl your patch assumes that the Fortran and C complex types are identical which unfortunately is not a correct assumption. I know we are abusing for real and integer by making the same assumption, but it "holds" because, unlike for complex, up to now we haven't found any compiler where this was not the case. I do not recall which compiler trigger that issue, maybe @jsquyres does.

jsquyres · 2021-04-05T19:39:40Z

@dalcinl Let me know when you have a final recipe ready and I'll incorporate it into my nightly MTT testing.

dalcinl · 2021-04-05T20:27:38Z

@jsquyres I'm waiting for open-mpi#8756 to provide a fix for open-mpi#8677. I'm also waiting for an already approved PR to get merged in MPICH. Then I can merge some required changes to mpi4py testsuite (mpi4py/mpi4py#37). After that, I have to reassess the Spawn situation, possibly disabling failings tests under ompi@master.

awlauria · 2021-04-06T14:36:38Z

@dalcinl I'll split out the fix for open-mpi#8677 since it isn't relevant to the main PR, and that may not go in for a few weeks.

dalcinl · 2021-04-06T15:11:50Z

@dalcinl I'll split out the fix

Getting that fix first will allow Jeff and I to move forward with this one. Many thanks!

awlauria · 2021-04-06T18:31:28Z

@dalcinl opened open-mpi#8779 to fix it.

dalcinl · 2021-04-07T11:31:52Z

@jsquyres Ideally, issue open-mpi#4135 should also be addressed, see my last comment. For the time being, mpi4py is skipping reduction tests with zero-size buffers.

dalcinl · 2021-04-22T16:54:38Z

@jsquyres With current master, is there a way to tell mpiexec that you want to oversubscribe? Before, I could do it via mca-params.conf or a environment variable, but now it looks that the only way is to do it seems to pass --map-by :oversubscribe.

dalcinl · 2021-05-13T20:45:35Z

@jsquyres We are almost there

The only remaining issue is being addressed in pack external32 long double conversion (extended 80 / quad 128) open-mpi/ompi#8941.
I would like a config-file/environ-var to enable oversubscription, for the time being I'm passing --map-by :OVERSUBSCRIBE to mpiexec.
Oversubscription on GitHub CI workers leads to a huge performance degradation, look for the first np=3 case in the logs. Is this related to How is oversubscription communicated to Open MPI? openpmix/openpmix#2192 ?

dalcinl · 2021-06-03T09:39:56Z

@jsquyres This is finally ready from my side. There is no point on delaying things for a minor issue left (external32 for long double). Now we have all the spawn tests passing thanks to recent oversubscription fixes. Please use the updated YAML file. Any tips on additional MCA parameters I could set to increase the level of debugging checks?

PS: Some day we could use a build matrix, for example to test builds with UCX. I would need your help to properly setup MCA parameters such that we trigger interesting code paths within Open MPI. For example, look at the MPICH builds, I'm testing eight different configurations.

opoplawski · 2021-12-15T02:45:57Z

ping?

jsquyres and others added 3 commits March 22, 2021 10:29

Add mpi4py GitHub Action testing

3760308

Many thanks to Lisandro Dalcin for 99% of this script (inspired from https://raw.githubusercontent.com/mpi4py/mpi4py-testing/master/.github/workflows/openmpi.yml). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

README.md: trivial change to force mpi4py testing

bd11a19

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

info_get_nthkey: release the correct object

e26aef5

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

mpi4py: skip the dynamic tests

2f633cd

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

jsquyres force-pushed the master branch from 3760308 to 9f74df9 Compare April 30, 2021 13:20

dalcinl mentioned this pull request Aug 19, 2021

cuda tests fail when CUDA is available but not configured mpi4py/mpi4py#80

Closed

jsquyres force-pushed the main branch from 7197fad to 7802c95 Compare January 8, 2023 21:49

jsquyres force-pushed the main branch 16 times, most recently from c2f0fa1 to 6199a14 Compare January 8, 2023 22:17

jsquyres force-pushed the main branch from 6199a14 to f0261cb Compare January 21, 2024 23:22

jsquyres force-pushed the main branch from 4d65f73 to bf639e9 Compare February 19, 2024 15:23

jsquyres force-pushed the main branch from 7a6ced7 to e9a0e65 Compare March 21, 2024 11:55

jsquyres force-pushed the main branch 3 times, most recently from 35b28d9 to 2911d48 Compare March 30, 2024 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md: trivial change to force mpi4py testing #4

README.md: trivial change to force mpi4py testing #4

jsquyres commented Mar 22, 2021

jsquyres commented Mar 22, 2021

dalcinl commented Mar 22, 2021

dalcinl commented Mar 22, 2021

jsquyres commented Mar 22, 2021

dalcinl commented Mar 22, 2021

jsquyres commented Mar 22, 2021

dalcinl commented Mar 22, 2021

jsquyres commented Mar 23, 2021 •

edited

Loading

dalcinl commented Mar 23, 2021

dalcinl commented Mar 23, 2021

jsquyres commented Mar 25, 2021

dalcinl commented Mar 25, 2021

bosilca commented Mar 25, 2021

jsquyres commented Apr 5, 2021

dalcinl commented Apr 5, 2021

awlauria commented Apr 6, 2021

dalcinl commented Apr 6, 2021

awlauria commented Apr 6, 2021

dalcinl commented Apr 7, 2021

dalcinl commented Apr 22, 2021

dalcinl commented May 13, 2021 •

edited

Loading

dalcinl commented Jun 3, 2021

opoplawski commented Dec 15, 2021

README.md: trivial change to force mpi4py testing #4

Are you sure you want to change the base?

README.md: trivial change to force mpi4py testing #4

Conversation

jsquyres commented Mar 22, 2021

jsquyres commented Mar 22, 2021

dalcinl commented Mar 22, 2021

dalcinl commented Mar 22, 2021

jsquyres commented Mar 22, 2021

dalcinl commented Mar 22, 2021

jsquyres commented Mar 22, 2021

dalcinl commented Mar 22, 2021

jsquyres commented Mar 23, 2021 • edited Loading

dalcinl commented Mar 23, 2021

dalcinl commented Mar 23, 2021

jsquyres commented Mar 25, 2021

dalcinl commented Mar 25, 2021

bosilca commented Mar 25, 2021

jsquyres commented Apr 5, 2021

dalcinl commented Apr 5, 2021

awlauria commented Apr 6, 2021

dalcinl commented Apr 6, 2021

awlauria commented Apr 6, 2021

dalcinl commented Apr 7, 2021

dalcinl commented Apr 22, 2021

dalcinl commented May 13, 2021 • edited Loading

dalcinl commented Jun 3, 2021

opoplawski commented Dec 15, 2021

jsquyres commented Mar 23, 2021 •

edited

Loading

dalcinl commented May 13, 2021 •

edited

Loading