Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add support for load_data from URI/URL #2875

Merged
merged 26 commits into from
Jun 3, 2024
Merged

Conversation

bmorris3
Copy link
Contributor

@bmorris3 bmorris3 commented May 15, 2024

This PR implements parser support for viz.load_data(uri) where uri is either a MAST URI or a web URL, for all *viz other than Mosviz.

Download and cache a remote FITS image on load into Imviz:

from jdaviz import Imviz

url = "https://www.astropy.org/astropy-data/tutorials/FITS-images/HorseHead.fits"

imviz = Imviz()
imviz.load_data(url, cache=True)
imviz.show()

Generic URLs are downloaded with astropy.utils.data.download_file

Download a FITS spectrum from MAST on load to Specviz:

from jdaviz import Specviz

uri = "mast:JWST/product/jw02732-o004_t004_miri_ch1-shortmediumlong_x1d.fits"

specviz = Specviz()
specviz.load_data(uri)
specviz.show()

MAST queries are done via astroquery.mast.Observations. A Cubeviz example:

from jdaviz import Cubeviz

uri = "mast:JWST/product/jw01373-o031_t007_miri_ch1-shortmediumlong_s3d.fits"

cubeviz = Cubeviz()
cubeviz.load_data(uri)
cubeviz.show()

and a Specviz2d example:

from jdaviz import Specviz2d

uri = "mast:JWST/product/jw01324-o006_s00005_nirspec_f100lp-g140h_s2d.fits"

specviz2d = Specviz2d()
specviz2d.load_data(uri)
specviz2d.show()

cc @havok2063.

Change log entry

  • Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
    list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
    should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone. Bugfix milestone also needs an accompanying backport label.
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)? 🐱

@github-actions github-actions bot added cubeviz specviz mosviz testing imviz plugin Label for plugins common to multiple configurations labels May 15, 2024
@bmorris3 bmorris3 added this to the 3.11 milestone May 15, 2024
@bmorris3 bmorris3 force-pushed the url-uri branch 2 times, most recently from 6ddccb5 to 770e738 Compare May 15, 2024 16:32
@bmorris3 bmorris3 marked this pull request as ready for review May 15, 2024 16:37
Copy link

codecov bot commented May 15, 2024

Codecov Report

Attention: Patch coverage is 92.59259% with 4 lines in your changes missing coverage. Please review.

Project coverage is 88.76%. Comparing base (0afaafc) to head (fc28c86).
Report is 172 commits behind head on main.

Files with missing lines Patch % Lines
jdaviz/utils.py 92.30% 3 Missing ⚠️
jdaviz/core/launcher.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2875      +/-   ##
==========================================
+ Coverage   88.70%   88.76%   +0.05%     
==========================================
  Files         111      111              
  Lines       17134    17172      +38     
==========================================
+ Hits        15199    15243      +44     
+ Misses       1935     1929       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jdaviz/utils.py Show resolved Hide resolved
jdaviz/utils.py Outdated
@@ -385,3 +390,52 @@ def total_masked_first_data(self):
def __setgluestate__(cls, rec, context):
masks = {key: context.object(value) for key, value in rec['masks'].items()}
return cls(masks=masks)


def download_uri_to_path(possible_uri, cache=False, local_path=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe default should be to always use cache, if available?

Suggested change
def download_uri_to_path(possible_uri, cache=False, local_path=None):
def download_uri_to_path(possible_uri, cache=True, local_path=None):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep it false by default – for users who cache a lot, it can be surprising and really frustrating to have significant disk space used by files without human readable names, making it hard to clean up without deleting all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the other side is that you keep downloading the same thing over and over again, which can get expensive via AWS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If our files were guaranteed to be small, I wouldn't blink. But I have cached a bunch of 5 GB NIRCam images, and regret that I did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have regretted accidentally re-downloading GB of data I already have. So, I don't think there is a perfect solution.

Now I think we should have a user doc about this new feature and document all the gotchas there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong feelings either way, but here's a third option: what if, since there are good arguments both ways, we require the user to explicitly set the cache argument in the case of loading from a URL? We could default to cache=None rather than True or False and then check to see if it's None in the appropriate case and raise an error telling the user to set it explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosteen , but you still need a default for "standalone app", do you not?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes that's true 😭

@@ -43,7 +43,7 @@ def prep_data_layer_as_dq(data):


@data_parser_registry("imviz-data-parser")
def parse_data(app, file_obj, ext=None, data_label=None, parent=None):
def parse_data(app, file_obj, ext=None, data_label=None, parent=None, cache=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



__all__ = ["specviz_spectrum1d_parser"]


@data_parser_registry("specviz-spectrum1d-parser")
def specviz_spectrum1d_parser(app, data, data_label=None, format=None, show_in_viewer=True,
concat_by_file=False):
concat_by_file=False, cache=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jdaviz/utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@rosteen rosteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one more suggestion and a comment on the caching default debate.

docs/imviz/import_data.rst Outdated Show resolved Hide resolved
jdaviz/utils.py Outdated
@@ -385,3 +390,52 @@ def total_masked_first_data(self):
def __setgluestate__(cls, rec, context):
masks = {key: context.object(value) for key, value in rec['masks'].items()}
return cls(masks=masks)


def download_uri_to_path(possible_uri, cache=False, local_path=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong feelings either way, but here's a third option: what if, since there are good arguments both ways, we require the user to explicitly set the cache argument in the case of loading from a URL? We could default to cache=None rather than True or False and then check to see if it's None in the appropriate case and raise an error telling the user to set it explicitly.

docs/conf.py Outdated
@@ -263,6 +263,7 @@
# Extra intersphinx in addition to what is already in sphinx-astropy
intersphinx_mapping.update({ # noqa: F405
'glueviz': ('https://docs.glueviz.org/en/stable/', None),
'astroquery': ('https://astroquery.readthedocs.io/en/stable/', None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given upstream service provider stuff can change faster than release, would latest be safer here? I dunno.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know either, I chose stable since all other intersphinx links were stable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glue points to latest 🤷‍♀️

docs/imviz/import_data.rst Outdated Show resolved Hide resolved
jdaviz/utils.py Show resolved Hide resolved
jdaviz/utils.py Outdated Show resolved Hide resolved
Co-authored-by: P. L. Lim <2090236+pllim@users.noreply.github.com>
jdaviz/utils.py Outdated
local_path = os.path.join(local_path, parsed_uri.path.split(os.path.sep)[-1])

if timeout is None:
timeout = Conf.timeout.defaultvalue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
timeout = Conf.timeout.defaultvalue
timeout = conf.timeout.defaultvalue

jdaviz/utils.py Outdated
if timeout is None:
timeout = Conf.timeout.defaultvalue

with Conf.timeout.set_temp(timeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with Conf.timeout.set_temp(timeout):
with conf.timeout.set_temp(timeout):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just caught these too in 77df5ff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be more pedantic, we could from astroquery.mast import conf as mast_conf but since there isn't any name clash now, doesn't matter.

Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thanks!

@pllim
Copy link
Contributor

pllim commented Jun 3, 2024

@pllim
Copy link
Contributor

pllim commented Jun 3, 2024

Ah, by changing from Conf to conf, the syntax to access default value might be different now. See the last example in https://docs.astropy.org/en/stable/config/index.html#exploring-configuration

@bmorris3 bmorris3 force-pushed the url-uri branch 2 times, most recently from b6eadbd to 1296986 Compare June 3, 2024 16:51
@bmorris3
Copy link
Contributor Author

bmorris3 commented Jun 3, 2024

Remaining failures are unrelated. Thanks @rosteen and @pllim!

@bmorris3 bmorris3 merged commit 07cc0f3 into spacetelescope:main Jun 3, 2024
18 of 19 checks passed
@kecnry kecnry mentioned this pull request Jun 25, 2024
9 tasks
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
cubeviz imviz mosviz plugin Label for plugins common to multiple configurations specviz testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants