Add lazy backend ASV test #7426

Illviljan · 2023-01-06T22:01:26Z

This tests xr.open_dataset without any slow file reading that can quickly become the majority of the performance time.

Related to #7374.

Timings for the new ASV-tests:


[ 50.85%] ··· dataset_io.IOReadCustomEngine.time_open_dataset                 ok
[ 50.85%] ··· ======== ============
               chunks              
              -------- ------------
                None     265±4ms   
                 {}     1.17±0.02s 
              ======== ============
[ 54.69%] ··· dataset_io.IOReadSingleFile.time_read_dataset                   ok
[ 54.69%] ··· ========= ============= =============
              --                   chunks          
              --------- ---------------------------
                engine       None           {}     
              ========= ============= =============
                scipy     4.81±0.1ms   6.65±0.01ms 
               netcdf4   8.41±0.08ms    10.9±0.2ms 
              ========= ============= =============

From the IOReadCustomEngine test we can see that chunking datasets with many variables (2000+) is considerably slower.

Illviljan · 2023-01-09T21:09:05Z

Timings for the new ASV-tests:


[ 50.85%] ··· dataset_io.IOReadCustomEngine.time_open_dataset                 ok
[ 50.85%] ··· ======== ============
               chunks              
              -------- ------------
                None     265±4ms   
                 {}     1.17±0.02s 
              ======== ============
[ 54.69%] ··· dataset_io.IOReadSingleFile.time_read_dataset                   ok
[ 54.69%] ··· ========= ============= =============
              --                   chunks          
              --------- ---------------------------
                engine       None           {}     
              ========= ============= =============
                scipy     4.81±0.1ms   6.65±0.01ms 
               netcdf4   8.41±0.08ms    10.9±0.2ms 
              ========= ============= =============

dcherian · 2023-01-12T15:59:44Z

asv_bench/benchmarks/dataset_io.py

+        xr.open_dataset(self.filepaths[engine], engine=engine, chunks=chunks)
+
+
+class IOReadCustomEngine:


Thanks this is a great benchmark.

Just a minor question: Shall we stick this in xarray.tests instead? I'm not sure if we have something similar for our tests already.

* main: (41 commits) v2023.01.0 whats-new (pydata#7440) explain keep_attrs in docstring of apply_ufunc (pydata#7445) Add sentence to open_dataset docstring (pydata#7438) pin scipy version in doc environment (pydata#7436) Improve performance for backend datetime handling (pydata#7374) fix typo (pydata#7433) Add lazy backend ASV test (pydata#7426) Pull Request Labeler - Workaround sync-labels bug (pydata#7431) see also : groupby in resample doc and vice-versa (pydata#7425) Some alignment optimizations (pydata#7382) Make `broadcast` and `concat` work with the Array API (pydata#7387) remove `numbagg` and `numba` from the upstream-dev CI (pydata#7416) [pre-commit.ci] pre-commit autoupdate (pydata#7402) Preserve original dtype when accessing MultiIndex levels (pydata#7393) [pre-commit.ci] pre-commit autoupdate (pydata#7389) [pre-commit.ci] pre-commit autoupdate (pydata#7360) COMPAT: Adjust CFTimeIndex.get_loc for pandas 2.0 deprecation enforcement (pydata#7361) Avoid loading entire dataset by getting the nbytes in an array (pydata#7356) `keep_attrs` for pad (pydata#7267) Bump pypa/gh-action-pypi-publish from 1.5.1 to 1.6.4 (pydata#7375) ...

Update dataset_io.py

445cb18

github-actions bot added run-benchmark Run the ASV benchmark workflow topic-performance labels Jan 6, 2023

Illviljan added run-benchmark Run the ASV benchmark workflow and removed run-benchmark Run the ASV benchmark workflow labels Jan 6, 2023

Illviljan added 2 commits January 6, 2023 23:30

Update dataset_io.py

9ec407f

Update dataset_io.py

561f136

Illviljan marked this pull request as draft January 6, 2023 23:56

Illviljan added 5 commits January 8, 2023 10:37

move _skip_slow to setup

2faf47d

Add timing for all engines.

f50f276

Update dataset_io.py

4adf32a

Update dataset_io.py

d103635

Update dataset_io.py

0c1f320

Illviljan marked this pull request as ready for review January 9, 2023 20:52

Illviljan added the plan to merge Final call for comments label Jan 9, 2023

Illviljan added 3 commits January 9, 2023 22:13

Update dataset_io.py

73c8210

Update dataset_io.py

25a4559

Update dataset_io.py

8cd49e8

Illviljan removed topic-performance run-benchmark Run the ASV benchmark workflow labels Jan 10, 2023

Illviljan closed this Jan 10, 2023

Illviljan reopened this Jan 10, 2023

github-actions bot added the topic-performance label Jan 10, 2023

Illviljan added the run-benchmark Run the ASV benchmark workflow label Jan 10, 2023

Merge branch 'main' into open_dataset_performance

cd9fae4

Illviljan mentioned this pull request Jan 10, 2023

Pull Request Labeler - Workaround sync-labels bug #7431

Merged

Illviljan merged commit 17933e7 into pydata:main Jan 11, 2023

Illviljan added the topic-backends label Jan 11, 2023

dcherian reviewed Jan 12, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lazy backend ASV test #7426

Add lazy backend ASV test #7426

Illviljan commented Jan 6, 2023 •

edited

Loading

Illviljan commented Jan 9, 2023 •

edited

Loading

dcherian Jan 12, 2023 •

edited

Loading

		xr.open_dataset(self.filepaths[engine], engine=engine, chunks=chunks)


		class IOReadCustomEngine:

Add lazy backend ASV test #7426

Add lazy backend ASV test #7426

Conversation

Illviljan commented Jan 6, 2023 • edited Loading

Illviljan commented Jan 9, 2023 • edited Loading

dcherian Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

Illviljan commented Jan 6, 2023 •

edited

Loading

Illviljan commented Jan 9, 2023 •

edited

Loading

dcherian Jan 12, 2023 •

edited

Loading