Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Attempt to fix threadsafe on skin #39

Closed
wants to merge 1 commit into from

Conversation

jbusecke
Copy link
Contributor

@jbusecke jbusecke commented Jun 24, 2022

As a first step I wanted to verify that the problem is actually related to threadsafe. And indeed when I remove the threadsafe from the 'skin' wrapper only, all tests pass locally.

So there must be something specific to the skin correction that is not threadsafe?

@jbusecke
Copy link
Contributor Author

Ok so running the tests from a script in the tests directory

from test_flux_xr import Test_xarray

test_class = Test_xarray()

shape = (10, 13, 12)
chunks = {"dim_0": 1}
skin_correction = True
test_class.test_chunked(shape, chunks, skin_correction)

gives me these sort of errors:

 ===================================================================
                    ----- AeroBulk_init -----

     *** Bulk parameterization to be used => "coare3p0"
        ==> will use the Cool-skin & Warm-ayer scheme of `coare3p0` !
     *** Computational domain shape: Ni x Nj = 00001 x 00013
     *** Number of time records that will be treated:          12
     *** Number of iterations in bulk algos: nb_iter  =    6
     *** Filling the `mask` array...
         ==> no points need to be masked! :)
     *** Type of prescribed air humidity  `relative humidity [%]`
 ===================================================================
 *** E R R O R :
  COARE3P0_INIT => allocation of Tau_ac, Qnt_ac, dT_wl & Hz_wl failed!

switching the algo shows a similar thing for ecmwf:

 *** E R R O R :
  ECMWF_INIT => allocation of dT_wl & Hz_wl failed!

These errors go back to the aerobulk code here and here, both parts of the codebase seem to allocate some internal optional output variable.

Does anyone have a hint to why this would not sit well with threadsafe (cc @rabernat)?

@jbusecke
Copy link
Contributor Author

@jbusecke jbusecke mentioned this pull request Jun 24, 2022
@jbusecke
Copy link
Contributor Author

Quick update. This only applies when using dask arrays, interestingly. The tests with the numpy only wrappers in tests/test_fortran.py all pass!

@rabernat
Copy link
Contributor

rabernat commented Jun 24, 2022

Is that because Dask uses threads?

Also, what is different about skin that makes it not thread safe vs. noskin.

@jbusecke
Copy link
Contributor Author

Sorry the previous comment was circular:
As far as I can tell the difference is in the allocation of optional output which is only done when chosing the skin correction option (there are slight differences between the actual algos but the failure happens at the same ALLOCATE call in both, see below:

These errors go back to the aerobulk code here and here, both parts of the codebase seem to allocate some internal optional output variable.

@jbusecke
Copy link
Contributor Author

Is that because Dask uses threads?

Any of my dask test cases has used the threaded scheduler (default). Before setting threadsafe, this worked for either skin/noskin but was very slow (basically running in serial). After setting threadsafe the noskin wrapper parallelizes as expected but the skin version actually FAILS.

@paigem paigem mentioned this pull request Jun 27, 2022
@paigem
Copy link
Collaborator

paigem commented Jun 28, 2022

Note that once #40 is merged, the skin code is no longer "threadsafe". The "threadsafe" option was removed in that PR to allow for tests to pass.

@jbusecke
Copy link
Contributor Author

Ill close this for now, since we have decided to move on without threadsafe for the skin option for now. We can reopen if needed

@jbusecke jbusecke closed this Jun 29, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v0.2.5 skin is broken
3 participants