map_over_datasets: skip empty nodes #10042

mathause · 2025-02-10T17:36:33Z

Closes map_over_datasets throws error on nodes without datasets #9693
Closes datatree gets dis-aligned in binary op #10013
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

misses tests and docs but I'd like to get some feedback first
needs some add. logic to only check the output on non-empty nodes and to ensure multi-output functions are correct
no good way for a proper deprecation without a keyword

Illviljan · 2025-02-16T09:44:57Z

A interpolation use case that doesn't crash with this PR:

import numpy as np

import xarray as xr

number_of_files = 700
number_of_groups = 5
number_of_variables = 10

datasets = {}
for f in range(number_of_files):
    for g in range(number_of_groups):
        # Create random data
        time = np.linspace(0, 50 + f, 1 + 1000 * g)
        y = f * time + g

        # Create dataset:
        ds = xr.Dataset(
            data_vars={
                f"temperature_{g}{i}": ("time", y)
                for i in range(number_of_variables // number_of_groups)
            },
            coords={"time": ("time", time)},
        ).chunk()

        # Prepare for xr.DataTree:
        name = f"file_{f}/group_{g}"
        datasets[name] = ds
dt = xr.DataTree.from_dict(datasets)

# %% Interpolate to same time coordinate
def ds_interp(ds, *args, **kwargs):
    return ds.interp(*args, **kwargs)


new_time = np.linspace(0, 100, 50)
dt_interp = dt.map_over_datasets(
    ds_interp, kwargs=dict(time=new_time, assume_sorted=True)
)

mathause · 2025-02-17T05:52:37Z

Thanks for the example. This PR would also close #10013. This would be a huge plus for me. Not being able to subtract a ds from a datatree makes it extremely cumbersome. However, this implies that the binary ops are implemented using map_over_datasets and means there is a considerable behavior change.

mathause · 2025-02-27T17:34:10Z

@TomNicholas do you see any chance this PR might get merged (after adding tests etc. obviously)? Are there discussions beside #9693 that I am missing?

TomNicholas · 2025-02-27T18:20:47Z

Hey @mathause - sorry for forgetting about this - I've been busy.

I think something like this should get merged, but there are various small and fairly arbitrary choices to quibble over. They are basically all already mentioned in #9693 though.

this implies that the binary ops are implemented using map_over_datasets and means there is a considerable behavior change.

I don't understand this statement though - aren't binary ops already implemented using map_over_datasets?

xarray/xarray/core/datatree.py

Line 1590 in 5ea1e81

return map_over_datasets(ds_binop, self, other)

We're changing the behaviour, but changing it to be closer to the old datatree, which is what a lot of users expect anyway.

mathause · 2025-03-07T05:00:45Z

I think this is ready for review

Hey @mathause - sorry for forgetting about this - I've been busy.

No worries! Thanks for considering this PR!

I think something like this should get merged, but there are various small and fairly arbitrary choices to quibble over. They are basically all already mentioned in #9693 though.

The one unclear choice from #9693 was the comment by @shoyer #9693 (comment):

I'm not sure whether or not to call the mapped over function for nodes that only define coordinates. Certainly I would not blindly copy coordinates from otherwise empty nodes onto the result, because those coordinates may no longer be relevant on the result.

I currently use DataTree.has_data, this includes nodes that only have coords (although nodes which inherit coords are excluded (I think)). I don't see a way to be clever about these nodes.

this implies that the binary ops are implemented using map_over_datasets and means there is a considerable behavior change.

I don't understand this statement though - aren't binary ops already implemented using map_over_datasets?

Yes sorry that was not clear. I just wanted to say that binary ops are also affected.

mathause · 2025-03-13T09:59:05Z

gentle ping @TomNicholas

(apologies for bothering you again - I am unfortunately currently blocked by this in another project. Or is there someone else who could potentially review this?)

TomNicholas · 2025-03-14T16:01:48Z

no worries, replied here #9693 (comment)

map_over_datasets: skip empty nodes

0104039

mathause marked this pull request as draft February 10, 2025 17:37

mathause mentioned this pull request Feb 10, 2025

compatibility with xr.DataTree MESMER-group/mesmer#607

Merged

3 tasks

fix typing

a23fb44

mathause mentioned this pull request Feb 10, 2025

map_over_datasets throws error on nodes without datasets #9693

Open

Merge branch 'main' into map_over_datasets_skip_empty_nodes

9b1755e

Merge branch 'main' into map_over_datasets_skip_empty_nodes

8b6a816

TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Feb 27, 2025

mathause added 6 commits March 3, 2025 09:30

Merge branch 'main' into map_over_datasets_skip_empty_nodes

2553a8e

changelog

f787a76

update docstring & comments

f2a924d

Merge branch 'main' into map_over_datasets_skip_empty_nodes

a3b735a

more comments

82de573

tests

0e34067

mathause closed this Mar 6, 2025

mathause reopened this Mar 6, 2025

mathause added 4 commits March 7, 2025 04:55

remove unnecessary test

bbfedf2

add binary op test

4ba1d27

mention binary ops

6e18bd7

clean test

65181a4

mathause marked this pull request as ready for review March 7, 2025 04:42

Merge branch 'main' into map_over_datasets_skip_empty_nodes

492379e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

map_over_datasets: skip empty nodes #10042

map_over_datasets: skip empty nodes #10042

mathause commented Feb 10, 2025 •

edited

Loading

Illviljan commented Feb 16, 2025

mathause commented Feb 17, 2025

mathause commented Feb 27, 2025

TomNicholas commented Feb 27, 2025

mathause commented Mar 7, 2025

mathause commented Mar 13, 2025

TomNicholas commented Mar 14, 2025

map_over_datasets: skip empty nodes #10042

Are you sure you want to change the base?

map_over_datasets: skip empty nodes #10042

Conversation

mathause commented Feb 10, 2025 • edited Loading

Illviljan commented Feb 16, 2025

mathause commented Feb 17, 2025

mathause commented Feb 27, 2025

TomNicholas commented Feb 27, 2025

mathause commented Mar 7, 2025

mathause commented Mar 13, 2025

TomNicholas commented Mar 14, 2025

mathause commented Feb 10, 2025 •

edited

Loading