Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

BUG: Always downsample histogram for Imviz in Plot Options #2735

Merged

Conversation

pllim
Copy link
Contributor

@pllim pllim commented Feb 27, 2024

Description

This pull request is to address serious performance issues in Imviz when large image is loaded and someone opens up Plot Options and everything waits for Histogram to compute.

Change log entry

  • Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
    list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
    should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone. Bugfix milestone also needs an accompanying backport label.
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)? 🐱

@pllim pllim added bug Something isn't working performance Performance related 💤backport-v3.8.x on-merge: backport to v3.8.x labels Feb 27, 2024
@pllim pllim added this to the 3.8.3 milestone Feb 27, 2024
@pllim pllim requested a review from camipacifici February 27, 2024 22:36
@github-actions github-actions bot added the plugin Label for plugins common to multiple configurations label Feb 27, 2024
@pllim pllim added the imviz label Feb 27, 2024
Copy link

codecov bot commented Feb 27, 2024

Codecov Report

Attention: Patch coverage is 55.55556% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 88.83%. Comparing base (0336005) to head (0d23edc).
Report is 7 commits behind head on main.

Files Patch % Lines
...nfigs/default/plugins/plot_options/plot_options.py 55.55% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2735      +/-   ##
==========================================
+ Coverage   88.67%   88.83%   +0.15%     
==========================================
  Files         108      108              
  Lines       15886    15928      +42     
==========================================
+ Hits        14087    14149      +62     
+ Misses       1799     1779      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

arr = comp.data[y_min:y_max, x_min:x_max]
if self.config == "imviz":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this is only for imviz?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It assumes 2D, so yes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it have to assume 2D (can we do this for large cubes as well)? And if so, maybe we can just check the data shape so that this will be available to image viewers in non-imviz configurations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is large cube a problem? The histogram only compute for the current slice, right? It is very seldom to have very large slice dimension, at least for the use cases we have so far?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The histogram should show whatever is being used to compute stretch percentiles, in my opinion. We're soon allowing toggling between current slice and full cube, so I'd expect the histogram to follow that same choice in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am torn on this because it sounds like the ultimate goal is you want true random sampling from glue-core (see #2735 (comment)). This means my fix here would be temporary and the immediate problem I am fixing is large image in Imviz, so the more I localize this now, the easier it is to revert in the future when the actual fix you want is implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fair, but I meant it should at least represent the same underlying data (even if we don't do exactly the same right now). I think this will eventually need to be generalized to the cube case, so if that's easy to do now, great, otherwise I guess we can do it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline and agreed to confine this to Imviz for now since there isn't any reported problem for Cubeviz yet.

because really there is no reason to sample everything.
@pllim pllim force-pushed the scotty-can-we-go-faster-than-warp-6 branch from b790011 to aad7319 Compare February 29, 2024 18:22
"source": [
"fig2, ax2 = plt.subplots()\n",
"_ = ax2.hist(subarr, bins=10)\n",
"_ = ax2.set_title(\"Gridded\")"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I see what @camipacifici is saying about the "frame" being wiped out. But then again, are the percentile vmin/vmax calculated using all the data, randomly sampled data, or gridded downsampled data in Imviz? Maybe @astrofrog knows.

Screenshot 2024-02-29 140608 Screenshot 2024-02-29 140657

Copy link
Contributor Author

@pllim pllim Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the cons of gridding matters in the end if it is the vmin/vmax that you are worried about. This is what Plot Options actually look like with the framed data in this notebook.

imviz = Imviz()
imviz.load_data(arr, data_label="Original")
imviz.show()
No gridding (main) With gridding (this PR)
hist_original hist_downsampled
vminmax_original vminmax_downsampled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the Imviz vmin/vmax when I changed the fill value to 99999 as Cami suggested: 🤷‍♀️

Screenshot 2024-02-29 170435 Screenshot 2024-02-29 170237

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To repeat what I said on Slack here: As things currently stands on main, gridding improves performance and does not seem to affect significantly how vmin/vmax is computed. If the real goal is to accurately have histogram and vmin/vmax reflect the real (full) data via true random sampling, I think the fix is actually in glue-core and would be quite involved. (Though my understanding of glue internals is very limited so feel free to correct me.)

@pllim pllim marked this pull request as ready for review March 1, 2024 16:38
Copy link
Contributor

@cshanahan1 cshanahan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me so far, just a few questions:

arr = comp.data[y_min:y_max, x_min:x_max]
if self.config == "imviz":
# Downsample input data to about 400px (as per compass.vue) for performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't matter that this can change the aspect ratio of the image, right? It's just being used for statistics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data is flattened anyway (using ravel()) for stats, so no, I don't think aspect ratio is important here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Another thought: this is basically making the maximum image size roughly 600x600 (which is when round(size/400) becomes greater than 1). If you have a 1000 x 50 array or something like that, the y dimension will be downsampled even though the image area is less than the maximum 600x600. Should the constraint be on area rather than an individual dimension size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the constraint be on area rather than an individual dimension size?

I am not sure. In the case of 1000 x 50 (nx x ny), it would become 500 x 50. This means X is sampled every other column, but Y is fully sampled. So if this is a star, imagine taking every other vertical strip off it and then doing stats on them.

Does it really affect the final histogram or vmin/vmax in a meaningful way? Not sure. If you have a science image like this, you can see how the histogram and vmin/vmax behave with and without this patch.

Also, if Tom R is going to do it properly upstream soon, then maybe we shouldn't waste too much time on this. I don't remember if he gave a concrete timeline this morning or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good questions though! Do you want to investigate the science image case with 1000 x 50? If not, I can try to do it tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I did a quick check and posting my results here. You can decide whether these are good or bad.

I make a 1000 x 50 using one of the data in ImvizDitheredExample notebook and displays it:

from astropy import units as u
from astropy.io import fits
from astropy.nddata import NDData
from astropy.utils.data import download_file
from astropy.wcs import WCS

from jdaviz import Imviz

imviz = Imviz()
imviz.show()

acs_47tuc_1 = download_file(
    'https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/jbqf03gjq_flc.fits', cache=True)
pf = fits.open(acs_47tuc_1)
a = NDData(pf[1].data[:50, :1000] * u.electron, wcs=WCS(pf[1].header, pf))
imviz.load_data(a, data_label="long_image")
On main This PR
full_hist_long_im subsampled_hist_long_im
full_vminmax_long_im subsampled_vminmax_long_im

p.s. The aspect ratio thingy does affect the Compass visualization though given we have plan to refactor that to use something completely different, I am not too worried about it.

arr = image[image.main_components[0]][::ystep, ::xstep]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the test image it doesn't really impact the final histogram, and if this is being changed upstream then i agree not to waste time.

@cshanahan1
Copy link
Contributor

does this need a test? some lines are not covered

@pllim
Copy link
Contributor Author

pllim commented Mar 6, 2024

does this need a test?

I am not sure how to properly test this. Any idea?

@cshanahan1
Copy link
Contributor

Hmmm im not sure. Does this code get called as soon as you open plot options? if so, the you could write a test that opens the plugin which would then run all the code, and then if the histogram bins or vmin/vmax are accessible just verify they are what you expected? If its not possible then thats fine it's only a few lines of coverage

@pllim
Copy link
Contributor Author

pllim commented Mar 6, 2024

just verify they are what you expected?

The main problem is that I don't know what is expected here, or to what degree of tolerance is acceptable. If you have an input and answers, I can probably write something up.

Copy link
Member

@kecnry kecnry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we can replace this with a more robust upstream solution in the near future, but until then this works for now and is an important performance improvement. Thanks!

@pllim pllim merged commit fbef31f into spacetelescope:main Mar 6, 2024
15 of 16 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/jdaviz that referenced this pull request Mar 6, 2024
@pllim
Copy link
Contributor Author

pllim commented Mar 6, 2024

Thanks for the thorough reviews. Much appreciated!

@pllim pllim deleted the scotty-can-we-go-faster-than-warp-6 branch March 6, 2024 18:50
pllim added a commit that referenced this pull request Mar 6, 2024
…5-on-v3.8.x

Backport PR #2735 on branch v3.8.x (BUG: Always downsample histogram for Imviz in Plot Options)
pllim added a commit to pllim/jdaviz that referenced this pull request Mar 20, 2024
astrofrog added a commit to astrofrog/jdaviz that referenced this pull request Mar 28, 2024
astrofrog added a commit to astrofrog/jdaviz that referenced this pull request Apr 9, 2024
astrofrog added a commit to astrofrog/jdaviz that referenced this pull request Apr 24, 2024
astrofrog added a commit to astrofrog/jdaviz that referenced this pull request May 7, 2024
pllim pushed a commit that referenced this pull request May 8, 2024
* Revert "BUG: Always downsample histogram for Imviz in Plot Options (#2735)"

This reverts commit fbef31f.

* Use glue's ability to randomly sample values when computing histograms and image statistics, and remove code to handle NaN values that caused the whole array to be loaded into memory.

* Use order='K' in ravel() to avoid copy

* Avoid calling ravel() since this causes a copy for cutouts

* Add comment to explain percentile values

* Fix issue when array len() match but .shape doesn't

* Adjust reference values for test

* Bumped minimum required version of glue-core and glue-jupyter

* Fix case where data is not a Numpy array

* Add change log
duytnguyendtn pushed a commit to duytnguyendtn/jdaviz that referenced this pull request Jul 23, 2024
… (spacetelescope#2771)

* Revert "BUG: Always downsample histogram for Imviz in Plot Options (spacetelescope#2735)"

This reverts commit fbef31f.

* Use glue's ability to randomly sample values when computing histograms and image statistics, and remove code to handle NaN values that caused the whole array to be loaded into memory.

* Use order='K' in ravel() to avoid copy

* Avoid calling ravel() since this causes a copy for cutouts

* Add comment to explain percentile values

* Fix issue when array len() match but .shape doesn't

* Adjust reference values for test

* Bumped minimum required version of glue-core and glue-jupyter

* Fix case where data is not a Numpy array

* Add change log
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working imviz performance Performance related plugin Label for plugins common to multiple configurations 💤backport-v3.8.x on-merge: backport to v3.8.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants