-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Cache NUMBA kernels between CI runs #279
Conversation
This looks awesome! I will probably move this to the dev branch before merging. |
Cool. Need to prod it a bit to see if it works. |
.github/workflows/ci.yaml
Outdated
- name: Cache Numba Kernels | ||
uses: actions/cache@v3 | ||
with: | ||
key: numba-cache-${{ steps.numba-cache-key.outputs.date }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constructing the key out of the date may be overkill. I suspect we could just use numba-cache
and it would propagate and be updated between runs.
I guess the downside is that it might accumulate a bunch of crufty old kernels. Note AFAICT there's a 10GB cache limit per repo and cache entries expire weekly so it may not be a big deal.
@bennahugo suggested we add the numba version to the cache key. I wonder if numba is clever enough to trigger recompiles on new numba versions.
The python version may also be relevant given a codex __pycache__
dir looks as follows
__init__.cpython-310.pyc
__init__.cpython-39.pyc
bda_avg.cpython-310.pyc
bda_avg.cpython-39.pyc
bda_avg.row_average-23.py310.1.nbc
bda_avg.row_average-23.py310.2.nbc
bda_avg.row_average-23.py310.nbi
bda_avg.row_average-23.py39.1.nbc
bda_avg.row_average-23.py39.2.nbc
bda_avg.row_average-23.py39.nbi
bda_avg.row_chan_average-313.py310.1.nbc
bda_avg.row_chan_average-313.py310.2.nbc
bda_avg.row_chan_average-313.py310.3.nbc
bda_avg.row_chan_average-313.py310.4.nbc
bda_avg.row_chan_average-313.py310.5.nbc
bda_avg.row_chan_average-313.py310.nbi
bda_avg.row_chan_average-313.py39.1.nbc
bda_avg.row_chan_average-313.py39.2.nbc
bda_avg.row_chan_average-313.py39.3.nbc
bda_avg.row_chan_average-313.py39.4.nbc
bda_avg.row_chan_average-313.py39.nbi
bda_mapping.bda_mapper-341.py310.1.nbc
bda_mapping.bda_mapper-341.py310.nbi
bda_mapping.bda_mapper-341.py39.1.nbc
bda_mapping.bda_mapper-341.py39.2.nbc
bda_mapping.bda_mapper-341.py39.nbi
bda_mapping.cpython-310.pyc
bda_mapping.cpython-39.pyc
So the kernel caching does not seem to be improving the test suite run time. even though kernel caches are created: https://github.com/ratt-ru/QuartiCal/actions/caches. This would also seem to suggest NUMBA_CACHE_DIR is respected. |
Is respected, or isn't? Might need to rerun the tests a few times - I have muddied the waters by merging in main. I do think that there is probably something which can be done - will take a closer look at the end of the week. |
I think it is respected -- The caches are about 11MB. Another thought occurred, the cached kernel modification times are probably earlier than the checked out python code -- this might trigger recompilation: https://numba.readthedocs.io/en/stable/developer/caching.html
Edit: Referenced the main article on caching, rather than the cuda article. |
Unfortunately it looks like it is the case that the timestamp is only the input to the cache key (at least as of Aug 22): https://numba.discourse.group/t/cache-behaviour/1520
So this approach doesn't seem viable. |
Ah unfortunate. Perhaps there will be progress upstream at some point. |
… required. (#285) * Fix version drift. * Bump to 0.2.0 * Use nearest-neighbour interpolation for points requiring extrapolation.
* Fix version drift. * Bump to 0.2.0 * Inspect envvar for scheduler address when one isn't specified. * Encode environment varraible as ascii. * Simplify.
* Fix version drift. * Bump to 0.2.0 * Initial commit of basic plotting functionality. * Change naming convention. * Improve transform argument. * Simplify transform selection. * Add rudimentary time and frequency selection. * Checkpoint ploter changes. Can now handle scans and spws, but is very slow. * More work on plotter - can now plot datasets in parallel. * Some tidying. * Slightly improve plot speed. Dominant cost is still saving the figures. * Commit some minor changes which speed up figure saving. * Lots of tiny fixes. * Tiny cosmetic changes. * Add custom tick formatter so that plots are the same size regardless. * Add matplotlib dependency. * Rework construction of plotting dictionary. Add a few utility functions which will likely be useful in other places in QC. * Rename variable to avoid confusion. * Fix bug affecting recursive grouping. * Avoid copies in grouping code. * Checkpoint work on extending functionality. * Make plotter more powerful. Add colourization option. Begin simplifying interface. * Allow user specification of colourmap. * Add plotsize parameter.
* Fix version drift. * Bump to 0.2.0 * Fix #293.
* Fix version drift. * Bump to 0.2.0 * Add optional label and single field selection to backup app * remove item instead of pop@index * do not .remove() from xds_list * Simplify using some existing functionality. --------- Co-authored-by: JSKenyon <jonathan.simon.kenyon@gmail.com> Co-authored-by: landmanbester <lbester@ska.ac.za>
* Fix version drift. * Bump to 0.2.0 * Setting MAD threshold to zero will disable flagging on a given statistic.
* Fix version drift. * Bump to 0.2.0 * Disable flagging based on off-diagonal correlations in the mad flagger by default. This should make the mad flagger less agressive on data with unmodelled polarised emission.
* Fix version drift. * Bump to 0.2.0 * Fix a bug afecting the use of non-standard columns in data column input.
* assign to ms to avoid over-writing metadata in restore app * zip datasets in enumerate * add comment to document failure case * use backup_column_name in restore app * Apply OCD. --------- Co-authored-by: landmanbester <lbester@ska.ac.za> Co-authored-by: JSKenyon <jonathan.simon.kenyon@gmail.com>
* Fix version drift. * Bump to 0.2.0 * Make summary correctly report FIELD_ID and SOURCE_ID.
* Drop 3.8. Commit poetry lock file. * Update stimela requirement. * Update docs. * Set min and max versions in pyproject.toml. * Remove python3.8 from test matrix.
* Cache NUMBA kernels between CI runs (#279) * Cache NUMBA kernels between CI runs * Use actions/cache@v3 * Cache per python version * runner.tmp -> runner.temp * Debugging * Fix * Run entire test suite * timestamp needed otherwise cache hit occurs and cache not updated * Fix output * Add revert_me.txt * Use nearest-neighbour interpolation in regions where extrapolation is required. (#285) * Fix version drift. * Bump to 0.2.0 * Use nearest-neighbour interpolation for points requiring extrapolation. * Utilise environment variable when dask.address is unset. (#288) * Fix version drift. * Bump to 0.2.0 * Inspect envvar for scheduler address when one isn't specified. * Encode environment varraible as ascii. * Simplify. * Add plotting functionality (#290) * Fix version drift. * Bump to 0.2.0 * Initial commit of basic plotting functionality. * Change naming convention. * Improve transform argument. * Simplify transform selection. * Add rudimentary time and frequency selection. * Checkpoint ploter changes. Can now handle scans and spws, but is very slow. * More work on plotter - can now plot datasets in parallel. * Some tidying. * Slightly improve plot speed. Dominant cost is still saving the figures. * Commit some minor changes which speed up figure saving. * Lots of tiny fixes. * Tiny cosmetic changes. * Add custom tick formatter so that plots are the same size regardless. * Add matplotlib dependency. * Rework construction of plotting dictionary. Add a few utility functions which will likely be useful in other places in QC. * Rename variable to avoid confusion. * Fix bug affecting recursive grouping. * Avoid copies in grouping code. * Checkpoint work on extending functionality. * Make plotter more powerful. Add colourization option. Begin simplifying interface. * Allow user specification of colourmap. * Add plotsize parameter. * Fix #293 - OOB access caused by `output.subtract_directions` (#294) * Fix version drift. * Bump to 0.2.0 * Fix #293. * Namedbackups (#296) * Fix version drift. * Bump to 0.2.0 * Add optional label and single field selection to backup app * remove item instead of pop@index * do not .remove() from xds_list * Simplify using some existing functionality. --------- Co-authored-by: JSKenyon <jonathan.simon.kenyon@gmail.com> Co-authored-by: landmanbester <lbester@ska.ac.za> * Selectively disable MAD flagging criteria (#298) * Fix version drift. * Bump to 0.2.0 * Setting MAD threshold to zero will disable flagging on a given statistic. * Disable mad flagging on off-diagonals by default (#300) * Fix version drift. * Bump to 0.2.0 * Disable flagging based on off-diagonal correlations in the mad flagger by default. This should make the mad flagger less agressive on data with unmodelled polarised emission. * Fix bug affecting non-standard columns in `input_ms.data_column` (#301) * Fix version drift. * Bump to 0.2.0 * Fix a bug afecting the use of non-standard columns in data column input. * Don't allow restore app to overwrite metadata (#307) * assign to ms to avoid over-writing metadata in restore app * zip datasets in enumerate * add comment to document failure case * use backup_column_name in restore app * Apply OCD. --------- Co-authored-by: landmanbester <lbester@ska.ac.za> Co-authored-by: JSKenyon <jonathan.simon.kenyon@gmail.com> * Fix for summary reporting SOURCE_ID as FIELD_ID (#309) * Fix version drift. * Bump to 0.2.0 * Make summary correctly report FIELD_ID and SOURCE_ID. * Fix receptor summary (#310) * Fix version drift. * Bump to 0.2.0 * Fix incorrect assumption that FEED substable will always have 2 receptors. * Fix similar problem affecting parallactic angle construction. * Update missing column selection for compatibility with upsteam changes. * Fix xarray dims (#318) * Fix version drift. * Bump to 0.2.0 * Move all usage of xds.dims[dim] to xds.sizes[dim] in preparation for change of return type in xds.dims. * Fixes for changes relating to Numba error types. (#319) * Move now-deprecated graph metrics function into the scheduler plugin code. (#320) * Make small changes to enable 3.11 compatibilty. Requires changes in stimela + a release. (#321) * Restringify keys in scheduler plugin. (#322) * Attempt very dodgy solution to caching problem. * Look for code in the correct place. * Update pyproject.toml. Add poetry.lock. Update docs. (#323) * Drop 3.8. Commit poetry lock file. * Update stimela requirement. * Update docs. * Set min and max versions in pyproject.toml. * Remove python3.8 from test matrix. * Some debugging. * Fix unsaved file. * More debugging. * Temporarily make test suite much smaller. * Fix path. * Actually fix path. * Attempt at safer caching. * More fiddling with paths. * Fix bad tabbing. * Try to find out where things are failing. * More fiddling. * More fiddling. * More fiddling. * Try restore time action. * Tidy up caching approach. Use action. Restore matrix and test everything. * Remove tmp file. * Reword CI step name. --------- Co-authored-by: JSKenyon <jonosken@gmail.com> Co-authored-by: Landman Bester <lbester@sarao.ac.za> Co-authored-by: JSKenyon <jonathan.simon.kenyon@gmail.com> Co-authored-by: landmanbester <lbester@ska.ac.za> * Bump dask-ms and codex-africanus dependencies. Update lock. --------- Co-authored-by: Simon Perkins <simon.perkins@gmail.com> Co-authored-by: Landman Bester <lbester@sarao.ac.za> Co-authored-by: landmanbester <lbester@ska.ac.za>
Closes #278