Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ENH: Imviz parser optimizations #2176

Merged
merged 4 commits into from
May 3, 2023
Merged

Conversation

bmorris3
Copy link
Contributor

Description

Imviz, and especially its parser, will do some heavy lifting for Roman files. I'm doing some profiling, and found that the parser tweaks in this PR speed up the parser by a factor of 3 on my machine (the machine matters for reasons I'll explain below).

The speedups come from the following corrections:

  • The top-level image parser decides how to parse a given file based on its file extension. This doesn't work on astropy-cached files, since they all have the file name contents. I worked around this earlier by making a reverse-lookup dict to find source URLs for each file in the cache (see below), which is actually quite slow if the cache is heavily used. Since I have 12,000 files in my cache (😱), this lookup took up a lot of time. In this PR, I avoid the lookup unless the filename is contents (no extension).

# If file_obj is a path to a cached file from
# astropy.utils.data.download_file, the path has no file extension.
# Here we check if the file is in the download cache, and if it is,
# we look up the file extension from the source URL:
path_to_url_mapping = {v: k for k, v in cache_contents().items()}
if file_obj in path_to_url_mapping:
source_url = path_to_url_mapping[file_obj]
# file_obj_lower is only used for checking extensions,
# file_obj is passed for parsing and is not modified here:
file_obj_lower = source_url.split('/')[-1].lower()
else:
file_obj_lower = file_obj.lower()

  • The parser did several checks on ASDF files. First it opens the file with asdf.open to check if it looks like a Roman data file. If not it raises and error, and if it's a Roman file it passes the open file object to roman_datamodels.datamodels.open to extract an ImageModel. This double-open incurs unnecessary overhead and is probably more thorough than it ought to be – I'd prefer a faster parser with a less-informative traceback when it fails to open an ASDF file with unknown provenance. Our docs say we only (partially) support Roman ASDF files.

elif file_obj_lower.endswith('.asdf'):
# First check if file might be a Roman data product.
with asdf.open(file_obj) as asdf_file:
# This is a convention of roman data products.
if 'roman' in asdf_file:
if not HAS_ROMAN_DATAMODELS:
raise ImportError(
"Roman ASDF detected but roman-datamodels is not installed.")
with rdd.open(asdf_file) as pf:
_parse_image(app, pf, data_label, ext=ext)
# Not Roman but also not really supported. Might still work though.
else: # pragma: no cover
_parse_image(app, asdf_file, data_label, ext=ext)

Here's a breakdown of the time spent in loading a single Roman file, which decreases from 3 to 1 seconds after this PR on my laptop. The first bullet above describes the path_to_url_mapping change, the second bullet results in removing the asdf.open call. I've also included a note on the one call that gets slower as a result of these changes, which is _parse_image. Since we had passed an open filestream from asdf.open to _parse_image before, the update in this PR slows down the call to _parse_image which now must open the file stream.

Method Before After Reason for difference
Total 2.75 s 0.94 s ---
path_to_url_mapping 0.716 s 0.000 s avoided unless needed
asdf.open 0.278 s 0.000 s not strictly necessary
_parse_image 0.006 s 0.019 s now slower by skipping asdf.open

Change log entry

  • Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
    list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
    should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone.
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)? 🐱

@pllim
Copy link
Contributor

pllim commented Apr 28, 2023

Not sure if I can get to this next week. I am okay with this if two other devs approve. Thanks!

@bmorris3 bmorris3 added this to the 3.4.1 milestone May 1, 2023
@bmorris3 bmorris3 added the 💤 enhancement New feature or request label May 1, 2023
Copy link
Collaborator

@rosteen rosteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me - just double checking my understanding, this is making the assumption that the only asdf files that Imviz will see are Roman files, right? And we're ok to make that assumption? I guess JWST uses asdf-in-FITS rather than having actual asdf files for images.

Copy link
Collaborator

@rosteen rosteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I'm ok assuming asdf files are Roman data for now.

Copy link
Member

@kecnry kecnry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion to the test, but otherwise looks good enough to me now that we've decided on the ASDF assumption question.

jdaviz/configs/imviz/tests/test_parser.py Outdated Show resolved Hide resolved
Co-authored-by: Kyle Conroy <kyleconroy@gmail.com>
@codecov
Copy link

codecov bot commented May 3, 2023

Codecov Report

Patch coverage: 40.00% and project coverage change: +0.21 🎉

Comparison is base (2953dec) 91.50% compared to head (4497e19) 91.72%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2176      +/-   ##
==========================================
+ Coverage   91.50%   91.72%   +0.21%     
==========================================
  Files         147      147              
  Lines       16142    16182      +40     
==========================================
+ Hits        14771    14843      +72     
+ Misses       1371     1339      -32     
Impacted Files Coverage Δ
jdaviz/configs/imviz/plugins/parsers.py 88.38% <33.33%> (-0.97%) ⬇️
jdaviz/configs/imviz/tests/test_parser.py 99.19% <50.00%> (-0.81%) ⬇️

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@bmorris3
Copy link
Contributor Author

bmorris3 commented May 3, 2023

One more follow-up. I made a demo notebook to test how efficiently ImageModels from roman_datamodels are loaded into Imviz, assuming they're already loaded from disk, and in memory. Even with 18 detectors with images of shape (4088, 4088), the elapsed time for making an Imviz instance and loading all detectors with WCS linking is finished in just under 2 seconds. So for now the bottleneck is definitely in loading ASDF files from disk in roman_datamodels.

@bmorris3 bmorris3 merged commit 9b19324 into spacetelescope:main May 3, 2023
@bmorris3
Copy link
Contributor Author

bmorris3 commented May 3, 2023

Thanks all! 🎉

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants