-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add support for load_data from URI/URL #2875
Conversation
6ddccb5
to
770e738
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2875 +/- ##
==========================================
+ Coverage 88.70% 88.76% +0.05%
==========================================
Files 111 111
Lines 17134 17172 +38
==========================================
+ Hits 15199 15243 +44
+ Misses 1935 1929 -6 ☔ View full report in Codecov by Sentry. |
jdaviz/utils.py
Outdated
@@ -385,3 +390,52 @@ def total_masked_first_data(self): | |||
def __setgluestate__(cls, rec, context): | |||
masks = {key: context.object(value) for key, value in rec['masks'].items()} | |||
return cls(masks=masks) | |||
|
|||
|
|||
def download_uri_to_path(possible_uri, cache=False, local_path=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe default should be to always use cache, if available?
def download_uri_to_path(possible_uri, cache=False, local_path=None): | |
def download_uri_to_path(possible_uri, cache=True, local_path=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep it false by default – for users who cache a lot, it can be surprising and really frustrating to have significant disk space used by files without human readable names, making it hard to clean up without deleting all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the other side is that you keep downloading the same thing over and over again, which can get expensive via AWS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our files were guaranteed to be small, I wouldn't blink. But I have cached a bunch of 5 GB NIRCam images, and regret that I did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have regretted accidentally re-downloading GB of data I already have. So, I don't think there is a perfect solution.
Now I think we should have a user doc about this new feature and document all the gotchas there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings either way, but here's a third option: what if, since there are good arguments both ways, we require the user to explicitly set the cache
argument in the case of loading from a URL? We could default to cache=None
rather than True
or False
and then check to see if it's None
in the appropriate case and raise an error telling the user to set it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosteen , but you still need a default for "standalone app", do you not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yes that's true 😭
@@ -43,7 +43,7 @@ def prep_data_layer_as_dq(data): | |||
|
|||
|
|||
@data_parser_registry("imviz-data-parser") | |||
def parse_data(app, file_obj, ext=None, data_label=None, parent=None): | |||
def parse_data(app, file_obj, ext=None, data_label=None, parent=None, cache=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
__all__ = ["specviz_spectrum1d_parser"] | ||
|
||
|
||
@data_parser_registry("specviz-spectrum1d-parser") | ||
def specviz_spectrum1d_parser(app, data, data_label=None, format=None, show_in_viewer=True, | ||
concat_by_file=False): | ||
concat_by_file=False, cache=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one more suggestion and a comment on the caching default debate.
jdaviz/utils.py
Outdated
@@ -385,3 +390,52 @@ def total_masked_first_data(self): | |||
def __setgluestate__(cls, rec, context): | |||
masks = {key: context.object(value) for key, value in rec['masks'].items()} | |||
return cls(masks=masks) | |||
|
|||
|
|||
def download_uri_to_path(possible_uri, cache=False, local_path=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings either way, but here's a third option: what if, since there are good arguments both ways, we require the user to explicitly set the cache
argument in the case of loading from a URL? We could default to cache=None
rather than True
or False
and then check to see if it's None
in the appropriate case and raise an error telling the user to set it explicitly.
docs/conf.py
Outdated
@@ -263,6 +263,7 @@ | |||
# Extra intersphinx in addition to what is already in sphinx-astropy | |||
intersphinx_mapping.update({ # noqa: F405 | |||
'glueviz': ('https://docs.glueviz.org/en/stable/', None), | |||
'astroquery': ('https://astroquery.readthedocs.io/en/stable/', None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given upstream service provider stuff can change faster than release, would latest
be safer here? I dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know either, I chose stable since all other intersphinx links were stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glue points to latest 🤷♀️
Co-authored-by: P. L. Lim <2090236+pllim@users.noreply.github.com>
Co-authored-by: P. L. Lim <2090236+pllim@users.noreply.github.com>
jdaviz/utils.py
Outdated
local_path = os.path.join(local_path, parsed_uri.path.split(os.path.sep)[-1]) | ||
|
||
if timeout is None: | ||
timeout = Conf.timeout.defaultvalue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout = Conf.timeout.defaultvalue | |
timeout = conf.timeout.defaultvalue |
jdaviz/utils.py
Outdated
if timeout is None: | ||
timeout = Conf.timeout.defaultvalue | ||
|
||
with Conf.timeout.set_temp(timeout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with Conf.timeout.set_temp(timeout): | |
with conf.timeout.set_temp(timeout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just caught these too in 77df5ff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to be more pedantic, we could from astroquery.mast import conf as mast_conf
but since there isn't any name clash now, doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. Thanks!
Hmm I think the test failures are real: https://github.com/spacetelescope/jdaviz/actions/runs/9353171515/job/25743015163?pr=2875 |
Ah, by changing from |
b6eadbd
to
1296986
Compare
This PR implements parser support for
viz.load_data(uri)
whereuri
is either a MAST URI or a web URL, for all*viz
other than Mosviz.Download and cache a remote FITS image on load into Imviz:
Generic URLs are downloaded with
astropy.utils.data.download_file
Download a FITS spectrum from MAST on load to Specviz:
MAST queries are done via
astroquery.mast.Observations
. A Cubeviz example:and a Specviz2d example:
cc @havok2063.
Change log entry
CHANGES.rst
? If you want to avoid merge conflicts,list the proposed change log here for review and add to
CHANGES.rst
before merge. If no, maintainershould add a
no-changelog-entry-needed
label.Checklist for package maintainer(s)
This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.
trivial
label.