ENH: Imviz parser optimizations #2176

bmorris3 · 2023-04-28T18:18:04Z

Description

Imviz, and especially its parser, will do some heavy lifting for Roman files. I'm doing some profiling, and found that the parser tweaks in this PR speed up the parser by a factor of 3 on my machine (the machine matters for reasons I'll explain below).

The speedups come from the following corrections:

The top-level image parser decides how to parse a given file based on its file extension. This doesn't work on astropy-cached files, since they all have the file name contents. I worked around this earlier by making a reverse-lookup dict to find source URLs for each file in the cache (see below), which is actually quite slow if the cache is heavily used. Since I have 12,000 files in my cache (😱), this lookup took up a lot of time. In this PR, I avoid the lookup unless the filename is contents (no extension).

jdaviz/jdaviz/configs/imviz/plugins/parsers.py

Lines 55 to 66 in aec6ed1

    
           # If file_obj is a path to a cached file from 
        
           # astropy.utils.data.download_file, the path has no file extension. 
        
           # Here we check if the file is in the download cache, and if it is, 
        
           # we look up the file extension from the source URL: 
        
           path_to_url_mapping = {v: k for k, v in cache_contents().items()} 
        
           if file_obj in path_to_url_mapping: 
        
               source_url = path_to_url_mapping[file_obj] 
        
               # file_obj_lower is only used for checking extensions, 
        
               # file_obj is passed for parsing and is not modified here: 
        
               file_obj_lower = source_url.split('/')[-1].lower() 
        
           else: 
        
               file_obj_lower = file_obj.lower()

The parser did several checks on ASDF files. First it opens the file with asdf.open to check if it looks like a Roman data file. If not it raises and error, and if it's a Roman file it passes the open file object to roman_datamodels.datamodels.open to extract an ImageModel. This double-open incurs unnecessary overhead and is probably more thorough than it ought to be – I'd prefer a faster parser with a less-informative traceback when it fails to open an ASDF file with unknown provenance. Our docs say we only (partially) support Roman ASDF files.

jdaviz/jdaviz/configs/imviz/plugins/parsers.py

Lines 79 to 91 in aec6ed1

    
           elif file_obj_lower.endswith('.asdf'): 
        
               # First check if file might be a Roman data product. 
        
               with asdf.open(file_obj) as asdf_file: 
        
                   # This is a convention of roman data products. 
        
                   if 'roman' in asdf_file: 
        
                       if not HAS_ROMAN_DATAMODELS: 
        
                           raise ImportError( 
        
                               "Roman ASDF detected but roman-datamodels is not installed.") 
        
                       with rdd.open(asdf_file) as pf: 
        
                           _parse_image(app, pf, data_label, ext=ext) 
        
                   # Not Roman but also not really supported. Might still work though. 
        
                   else:  # pragma: no cover 
        
                       _parse_image(app, asdf_file, data_label, ext=ext)

Here's a breakdown of the time spent in loading a single Roman file, which decreases from 3 to 1 seconds after this PR on my laptop. The first bullet above describes the path_to_url_mapping change, the second bullet results in removing the asdf.open call. I've also included a note on the one call that gets slower as a result of these changes, which is _parse_image. Since we had passed an open filestream from asdf.open to _parse_image before, the update in this PR slows down the call to _parse_image which now must open the file stream.

Method	Before	After	Reason for difference
Total	2.75 s	0.94 s	---
`path_to_url_mapping`	0.716 s	0.000 s	avoided unless needed
`asdf.open`	0.278 s	0.000 s	not strictly necessary
`_parse_image`	0.006 s	0.019 s	now slower by skipping `asdf.open`

Change log entry

Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
Do the proposed changes follow the STScI Style Guides?
Are tests added/updated as required? If so, do they follow the STScI Style Guides?
Are docs added/updated as required? If so, do they follow the STScI Style Guides?
Did the CI pass? If not, are the failures related?
Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone.
After merge, any internal documentations need updating (e.g., JIRA, Innerspace)? 🐱

pllim · 2023-04-28T22:31:50Z

Not sure if I can get to this next week. I am okay with this if two other devs approve. Thanks!

rosteen

Looks fine to me - just double checking my understanding, this is making the assumption that the only asdf files that Imviz will see are Roman files, right? And we're ok to make that assumption? I guess JWST uses asdf-in-FITS rather than having actual asdf files for images.

rosteen

As discussed offline, I'm ok assuming asdf files are Roman data for now.

kecnry

One small suggestion to the test, but otherwise looks good enough to me now that we've decided on the ASDF assumption question.

jdaviz/configs/imviz/tests/test_parser.py

Co-authored-by: Kyle Conroy <kyleconroy@gmail.com>

codecov · 2023-05-03T23:39:01Z

Codecov Report

Patch coverage: 40.00% and project coverage change: +0.21 🎉

Comparison is base (2953dec) 91.50% compared to head (4497e19) 91.72%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2176      +/-   ##
==========================================
+ Coverage   91.50%   91.72%   +0.21%     
==========================================
  Files         147      147              
  Lines       16142    16182      +40     
==========================================
+ Hits        14771    14843      +72     
+ Misses       1371     1339      -32

Impacted Files	Coverage Δ
jdaviz/configs/imviz/plugins/parsers.py	`88.38% <33.33%> (-0.97%)`	⬇️
jdaviz/configs/imviz/tests/test_parser.py	`99.19% <50.00%> (-0.81%)`	⬇️

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

bmorris3 · 2023-05-03T23:40:34Z

One more follow-up. I made a demo notebook to test how efficiently ImageModels from roman_datamodels are loaded into Imviz, assuming they're already loaded from disk, and in memory. Even with 18 detectors with images of shape (4088, 4088), the elapsed time for making an Imviz instance and loading all detectors with WCS linking is finished in just under 2 seconds. So for now the bottleneck is definitely in loading ASDF files from disk in roman_datamodels.

bmorris3 · 2023-05-03T23:49:31Z

Thanks all! 🎉

optimizations from profiling

133f02b

bmorris3 requested review from duytnguyendtn, rosteen, javerbukh, pllim, kecnry, haticekaratay and cshanahan1 as code owners April 28, 2023 18:18

github-actions bot added the imviz label Apr 28, 2023

more general test for improved parser

80dcaa2

bmorris3 added this to the 3.4.1 milestone May 1, 2023

bmorris3 added the 💤 enhancement New feature or request label May 1, 2023

adding changelog entry

06d2611

rosteen reviewed May 1, 2023

View reviewed changes

rosteen approved these changes May 2, 2023

View reviewed changes

kecnry approved these changes May 2, 2023

View reviewed changes

jdaviz/configs/imviz/tests/test_parser.py Outdated Show resolved Hide resolved

kecnry added the Ready for final review label May 2, 2023

Update jdaviz/configs/imviz/tests/test_parser.py

4497e19

Co-authored-by: Kyle Conroy <kyleconroy@gmail.com>

bmorris3 merged commit 9b19324 into spacetelescope:main May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Imviz parser optimizations #2176

ENH: Imviz parser optimizations #2176

bmorris3 commented Apr 28, 2023

pllim commented Apr 28, 2023

rosteen left a comment

rosteen left a comment

kecnry left a comment

codecov bot commented May 3, 2023

bmorris3 commented May 3, 2023

bmorris3 commented May 3, 2023

	# If file_obj is a path to a cached file from
	# astropy.utils.data.download_file, the path has no file extension.
	# Here we check if the file is in the download cache, and if it is,
	# we look up the file extension from the source URL:
	path_to_url_mapping = {v: k for k, v in cache_contents().items()}
	if file_obj in path_to_url_mapping:
	source_url = path_to_url_mapping[file_obj]
	# file_obj_lower is only used for checking extensions,
	# file_obj is passed for parsing and is not modified here:
	file_obj_lower = source_url.split('/')[-1].lower()
	else:
	file_obj_lower = file_obj.lower()

	elif file_obj_lower.endswith('.asdf'):
	# First check if file might be a Roman data product.
	with asdf.open(file_obj) as asdf_file:
	# This is a convention of roman data products.
	if 'roman' in asdf_file:
	if not HAS_ROMAN_DATAMODELS:
	raise ImportError(
	"Roman ASDF detected but roman-datamodels is not installed.")
	with rdd.open(asdf_file) as pf:
	_parse_image(app, pf, data_label, ext=ext)
	# Not Roman but also not really supported. Might still work though.
	else: # pragma: no cover
	_parse_image(app, asdf_file, data_label, ext=ext)

ENH: Imviz parser optimizations #2176

ENH: Imviz parser optimizations #2176

Conversation

bmorris3 commented Apr 28, 2023

Description

Change log entry

Checklist for package maintainer(s)

pllim commented Apr 28, 2023

rosteen left a comment

Choose a reason for hiding this comment

rosteen left a comment

Choose a reason for hiding this comment

kecnry left a comment

Choose a reason for hiding this comment

codecov bot commented May 3, 2023

Codecov Report

bmorris3 commented May 3, 2023

bmorris3 commented May 3, 2023