Skip to content

Add lazy backend ASV test #7426

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 12 commits into from
Jan 11, 2023
Merged

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Jan 6, 2023

This tests xr.open_dataset without any slow file reading that can quickly become the majority of the performance time.

Related to #7374.

Timings for the new ASV-tests:


[ 50.85%] ··· dataset_io.IOReadCustomEngine.time_open_dataset                 ok
[ 50.85%] ··· ======== ============
               chunks              
              -------- ------------
                None     265±4ms   
                 {}     1.17±0.02s 
              ======== ============
[ 54.69%] ··· dataset_io.IOReadSingleFile.time_read_dataset                   ok
[ 54.69%] ··· ========= ============= =============
              --                   chunks          
              --------- ---------------------------
                engine       None           {}     
              ========= ============= =============
                scipy     4.81±0.1ms   6.65±0.01ms 
               netcdf4   8.41±0.08ms    10.9±0.2ms 
              ========= ============= =============

From the IOReadCustomEngine test we can see that chunking datasets with many variables (2000+) is considerably slower.

@github-actions github-actions bot added run-benchmark Run the ASV benchmark workflow topic-performance labels Jan 6, 2023
@Illviljan Illviljan added run-benchmark Run the ASV benchmark workflow and removed run-benchmark Run the ASV benchmark workflow labels Jan 6, 2023
@Illviljan Illviljan marked this pull request as draft January 6, 2023 23:56
@Illviljan Illviljan marked this pull request as ready for review January 9, 2023 20:52
@Illviljan
Copy link
Contributor Author

Illviljan commented Jan 9, 2023

Timings for the new ASV-tests:


[ 50.85%] ··· dataset_io.IOReadCustomEngine.time_open_dataset                 ok
[ 50.85%] ··· ======== ============
               chunks              
              -------- ------------
                None     265±4ms   
                 {}     1.17±0.02s 
              ======== ============
[ 54.69%] ··· dataset_io.IOReadSingleFile.time_read_dataset                   ok
[ 54.69%] ··· ========= ============= =============
              --                   chunks          
              --------- ---------------------------
                engine       None           {}     
              ========= ============= =============
                scipy     4.81±0.1ms   6.65±0.01ms 
               netcdf4   8.41±0.08ms    10.9±0.2ms 
              ========= ============= =============

@Illviljan Illviljan added the plan to merge Final call for comments label Jan 9, 2023
@Illviljan Illviljan removed topic-performance run-benchmark Run the ASV benchmark workflow labels Jan 10, 2023
@Illviljan Illviljan closed this Jan 10, 2023
@Illviljan Illviljan reopened this Jan 10, 2023
@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Jan 10, 2023
xr.open_dataset(self.filepaths[engine], engine=engine, chunks=chunks)


class IOReadCustomEngine:
Copy link
Contributor

@dcherian dcherian Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is a great benchmark.

Just a minor question: Shall we stick this in xarray.tests instead? I'm not sure if we have something similar for our tests already.

dcherian added a commit to dcherian/xarray that referenced this pull request Jan 18, 2023
* main: (41 commits)
  v2023.01.0 whats-new (pydata#7440)
  explain keep_attrs in docstring of apply_ufunc (pydata#7445)
  Add sentence to open_dataset docstring (pydata#7438)
  pin scipy version in doc environment (pydata#7436)
  Improve performance for backend datetime handling (pydata#7374)
  fix typo (pydata#7433)
  Add lazy backend ASV test (pydata#7426)
  Pull Request Labeler - Workaround sync-labels bug (pydata#7431)
  see also : groupby in resample doc and vice-versa (pydata#7425)
  Some alignment optimizations (pydata#7382)
  Make `broadcast` and `concat` work with the Array API (pydata#7387)
  remove `numbagg` and `numba` from the upstream-dev CI (pydata#7416)
  [pre-commit.ci] pre-commit autoupdate (pydata#7402)
  Preserve original dtype when accessing MultiIndex levels (pydata#7393)
  [pre-commit.ci] pre-commit autoupdate (pydata#7389)
  [pre-commit.ci] pre-commit autoupdate (pydata#7360)
  COMPAT: Adjust CFTimeIndex.get_loc for pandas 2.0 deprecation enforcement (pydata#7361)
  Avoid loading entire dataset by getting the nbytes in an array (pydata#7356)
  `keep_attrs` for pad (pydata#7267)
  Bump pypa/gh-action-pypi-publish from 1.5.1 to 1.6.4 (pydata#7375)
  ...
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-backends topic-performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants