From f0812f8c6893a72da2843e68ded9584de4a49cc5 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Mon, 7 Aug 2023 15:57:54 +0200 Subject: [PATCH 1/4] add outline --- src/posts/flexible-indexes/index.md | 62 +++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 src/posts/flexible-indexes/index.md diff --git a/src/posts/flexible-indexes/index.md b/src/posts/flexible-indexes/index.md new file mode 100644 index 00000000..cc355d53 --- /dev/null +++ b/src/posts/flexible-indexes/index.md @@ -0,0 +1,62 @@ +--- +title: 'Xarray indexes: unleash the power of coordinates' +date: '2023-08-07' +authors: + - name: Benoît Bovy + github: benbovy +summary: 'It is now possible to take full advantage of coordinate data via Xarray explicit and flexible indexes' +--- + +_TLDR: Xarray has been through a major refactoring of its internals that makes coordinate-based data selection and alignment (almost) fully customizable, via built-in and/or 3rd party indexes. It also addresses a good amount of long-standing issues with "dimension coordinates" implicitly backed by pandas (multi-)indexes._ + +## Introduction + +[link to Joe's CZI blog post] + +## The concept of "dimension coordinate" and its shortcomings + +Some datasets could not be loaded with Xarray (dimension name and coordinate with same name but different dimensions) + +Complicated workarounds (swap_dims, etc.) + +Limited and/or challenging for data cubes representing arbitrary grids (curvilinear grids, unstructured meshes, etc.). + +## Better index vs. coordinate separation + +Refactor index logic in `Index` classes. More easily maintainable. May help Pandas become optional dependency in the future? (cf. Xarray-lite). + +Also allowed to solve lots of issues with multi-indexes, for which each level has now its own real coordinate. + +Dataset / DataArray section has now an "indexes" section. + +## Selection using non-dimension, 1-d coordinates + +Set an index for non-dimension coordinates! (No more swap_dims anymore or coordinate renaming) + +```python +ds.set_xindex(“non_dim_coord”).sel(non_dim_coord=“something”) +``` + +## Alternatives to pandas.Index + +E.g., Numpy index (much faster to build, much more expensive to query), Geometry index (xvec) + +Out-of-core index, etc. + +...or no index at all! (Create dataset with no default index, ``drop_indexes``) + +## Create custom indexes from arbitrary coordinates and dimensions + +Not limited to 1-dimensional coordinates, even more flexible! + +RasterIndex, FunctionalIndex, etc. + +See xarray discussion for examples + +## What’s next + +Still unfinished [link: indexes next steps GH issue], extension entry points, etc. + +## Acknowledgments + +CZI, Xarray core developers, etc. From 646ddb8b7324938e637fb2ac83cd157bbfe3be13 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 7 Aug 2023 14:16:48 +0000 Subject: [PATCH 2/4] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- src/posts/flexible-indexes/index.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/posts/flexible-indexes/index.md b/src/posts/flexible-indexes/index.md index cc355d53..b589d973 100644 --- a/src/posts/flexible-indexes/index.md +++ b/src/posts/flexible-indexes/index.md @@ -13,7 +13,7 @@ _TLDR: Xarray has been through a major refactoring of its internals that makes c [link to Joe's CZI blog post] -## The concept of "dimension coordinate" and its shortcomings +## The concept of "dimension coordinate" and its shortcomings Some datasets could not be loaded with Xarray (dimension name and coordinate with same name but different dimensions) @@ -27,7 +27,7 @@ Refactor index logic in `Index` classes. More easily maintainable. May help Pand Also allowed to solve lots of issues with multi-indexes, for which each level has now its own real coordinate. -Dataset / DataArray section has now an "indexes" section. +Dataset / DataArray section has now an "indexes" section. ## Selection using non-dimension, 1-d coordinates @@ -43,7 +43,7 @@ E.g., Numpy index (much faster to build, much more expensive to query), Geometry Out-of-core index, etc. -...or no index at all! (Create dataset with no default index, ``drop_indexes``) +...or no index at all! (Create dataset with no default index, `drop_indexes`) ## Create custom indexes from arbitrary coordinates and dimensions @@ -57,6 +57,6 @@ See xarray discussion for examples Still unfinished [link: indexes next steps GH issue], extension entry points, etc. -## Acknowledgments +## Acknowledgments CZI, Xarray core developers, etc. From db5ab03d7325bda7b480c74f741dee3f1d1441e0 Mon Sep 17 00:00:00 2001 From: Scott Henderson Date: Thu, 5 Jun 2025 20:13:38 +0200 Subject: [PATCH 3/4] revamp blog post --- .gitignore | 5 + src/posts/flexible-indexes/index.md | 55 +-- .../flexible-indexes/rangeindex-repr.html | 451 ++++++++++++++++++ 3 files changed, 484 insertions(+), 27 deletions(-) create mode 100644 src/posts/flexible-indexes/rangeindex-repr.html diff --git a/.gitignore b/.gitignore index f65d00b8..3729e734 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,8 @@ +public/atom.xml +public/rss.json +public/rss.html +public/rss.xml + yarn.lock package-lock.json diff --git a/src/posts/flexible-indexes/index.md b/src/posts/flexible-indexes/index.md index b589d973..01bf8fe3 100644 --- a/src/posts/flexible-indexes/index.md +++ b/src/posts/flexible-indexes/index.md @@ -1,62 +1,63 @@ --- title: 'Xarray indexes: unleash the power of coordinates' -date: '2023-08-07' +date: '2025-06-05' authors: - name: Benoît Bovy github: benbovy + - name: Scott Henderson + github: scottyhq summary: 'It is now possible to take full advantage of coordinate data via Xarray explicit and flexible indexes' --- -_TLDR: Xarray has been through a major refactoring of its internals that makes coordinate-based data selection and alignment (almost) fully customizable, via built-in and/or 3rd party indexes. It also addresses a good amount of long-standing issues with "dimension coordinates" implicitly backed by pandas (multi-)indexes._ +_TLDR: Xarray has been through a major refactoring of its internals that makes coordinate-based data selection and alignment more customizable, via built-in and/or 3rd party indexes! In this post we highlight a few examples that take advantage of this new superpower_ ## Introduction -[link to Joe's CZI blog post] +Xarray is a large project that is constantly evolving to meet needs of users and stay relevant to work with novel data formats and use-cases. One area of improvement identified in the [Development Roadmap](https://docs.xarray.dev/en/stable/roadmap.html#flexible-indexes) is the ability add new coordinate indexing capabilities beyond the original `pandas.Index`. Let's look at a few examples to understand what is now possible! -## The concept of "dimension coordinate" and its shortcomings +TODO: Insert Benoit's awesome schematic from indexing sprint :) -Some datasets could not be loaded with Xarray (dimension name and coordinate with same name but different dimensions) +## Alternatives to pandas.Index -Complicated workarounds (swap_dims, etc.) +Generally-useful index alternatives are already part of Xarray! -Limited and/or challenging for data cubes representing arbitrary grids (curvilinear grids, unstructured meshes, etc.). +### RangeIndex -## Better index vs. coordinate separation +By default a `pandas.Index` calculates all coordinates and holds them in-memory. There are many use-cases where for 1-D coordinates where it's more efficient to store the start,stop,and step and calculate specific coordinate values on-the-fly. THis is what RangeIndex accomplishes: -Refactor index logic in `Index` classes. More easily maintainable. May help Pandas become optional dependency in the future? (cf. Xarray-lite). +```python +import xarray as xr +from xarray.indexes import RangeIndex -Also allowed to solve lots of issues with multi-indexes, for which each level has now its own real coordinate. +index = RangeIndex.arange(0.0, 100_000, 0.1, dim='x') +ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index)) +ds +``` -Dataset / DataArray section has now an "indexes" section. + -## Selection using non-dimension, 1-d coordinates -Set an index for non-dimension coordinates! (No more swap_dims anymore or coordinate renaming) +### IntervalIndex -```python -ds.set_xindex(“non_dim_coord”).sel(non_dim_coord=“something”) -``` - -## Alternatives to pandas.Index +TODO: Not sure if this one is ready to highlight(https://github.com/pydata/xarray/pull/10296) -E.g., Numpy index (much faster to build, much more expensive to query), Geometry index (xvec) -Out-of-core index, etc. +## Third-party custom Indexes -...or no index at all! (Create dataset with no default index, `drop_indexes`) -## Create custom indexes from arbitrary coordinates and dimensions +### Xvec GeometryIndex -Not limited to 1-dimensional coordinates, even more flexible! +TODO: Highlight https://xvec.readthedocs.io/en/v0.2.0/generated/xvec.GeometryIndex.html -RasterIndex, FunctionalIndex, etc. +### RasterIndex -See xarray discussion for examples +TODO: Highlight https://github.com/dcherian/rasterix ## What’s next -Still unfinished [link: indexes next steps GH issue], extension entry points, etc. + While we're extremely excited about what can *already* be accomplished with the new indexing capabilities, there are plenty of exciting ideas for future work. If you're interested in getting involved we recommend following [this GitHub Issue](https://github.com/pydata/xarray/issues/6293)! ## Acknowledgments -CZI, Xarray core developers, etc. +This work would not have been possible without technical input from the Xarray core team and community! +Several developers received essential funding from a [CZI Essential Open Source Software for Science (EOSS) grant](https://xarray.dev/blog/czi-eoss-grant-conclusion) as well as NASA's Open Source Tools, Frameworks, and Libraries (OSTFL) grant 80NSSC22K0345. diff --git a/src/posts/flexible-indexes/rangeindex-repr.html b/src/posts/flexible-indexes/rangeindex-repr.html new file mode 100644 index 00000000..dbe1f4ba --- /dev/null +++ b/src/posts/flexible-indexes/rangeindex-repr.html @@ -0,0 +1,451 @@ +
+ + + + + + + + + + + + + + +
<xarray.Dataset> Size: 8MB
+Dimensions:  (x: 1000000)
+Coordinates:
+  * x        (x) float64 8MB 0.0 0.1 0.2 0.3 0.4 ... 1e+05 1e+05 1e+05 1e+05
+Data variables:
+    *empty*
+Indexes:
+    x        RangeIndex (start=0, stop=1e+05, step=0.1)
From 76c9d1572a84e5cb15d11087ae5afb9833f298ed Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu, 5 Jun 2025 18:13:53 +0000 Subject: [PATCH 4/4] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- src/posts/flexible-indexes/index.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/src/posts/flexible-indexes/index.md b/src/posts/flexible-indexes/index.md index 01bf8fe3..9a4cb625 100644 --- a/src/posts/flexible-indexes/index.md +++ b/src/posts/flexible-indexes/index.md @@ -36,15 +36,12 @@ ds - ### IntervalIndex TODO: Not sure if this one is ready to highlight(https://github.com/pydata/xarray/pull/10296) - ## Third-party custom Indexes - ### Xvec GeometryIndex TODO: Highlight https://xvec.readthedocs.io/en/v0.2.0/generated/xvec.GeometryIndex.html @@ -55,7 +52,7 @@ TODO: Highlight https://github.com/dcherian/rasterix ## What’s next - While we're extremely excited about what can *already* be accomplished with the new indexing capabilities, there are plenty of exciting ideas for future work. If you're interested in getting involved we recommend following [this GitHub Issue](https://github.com/pydata/xarray/issues/6293)! +While we're extremely excited about what can _already_ be accomplished with the new indexing capabilities, there are plenty of exciting ideas for future work. If you're interested in getting involved we recommend following [this GitHub Issue](https://github.com/pydata/xarray/issues/6293)! ## Acknowledgments