map_over_datasets: skip empty nodes #10042

mathause · 2025-02-10T17:36:33Z

Closes map_over_datasets throws error on nodes without datasets #9693
Closes datatree gets dis-aligned in binary op #10013
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

misses tests and docs but I'd like to get some feedback first
needs some add. logic to only check the output on non-empty nodes and to ensure multi-output functions are correct
no good way for a proper deprecation without a keyword

Illviljan · 2025-02-16T09:44:57Z

A interpolation use case that doesn't crash with this PR:

import numpy as np

import xarray as xr

number_of_files = 700
number_of_groups = 5
number_of_variables = 10

datasets = {}
for f in range(number_of_files):
    for g in range(number_of_groups):
        # Create random data
        time = np.linspace(0, 50 + f, 1 + 1000 * g)
        y = f * time + g

        # Create dataset:
        ds = xr.Dataset(
            data_vars={
                f"temperature_{g}{i}": ("time", y)
                for i in range(number_of_variables // number_of_groups)
            },
            coords={"time": ("time", time)},
        ).chunk()

        # Prepare for xr.DataTree:
        name = f"file_{f}/group_{g}"
        datasets[name] = ds
dt = xr.DataTree.from_dict(datasets)

# %% Interpolate to same time coordinate
def ds_interp(ds, *args, **kwargs):
    return ds.interp(*args, **kwargs)


new_time = np.linspace(0, 100, 50)
dt_interp = dt.map_over_datasets(
    ds_interp, kwargs=dict(time=new_time, assume_sorted=True)
)

mathause · 2025-02-17T05:52:37Z

Thanks for the example. This PR would also close #10013. This would be a huge plus for me. Not being able to subtract a ds from a datatree makes it extremely cumbersome. However, this implies that the binary ops are implemented using map_over_datasets and means there is a considerable behavior change.

map_over_datasets: skip empty nodes

0104039

mathause marked this pull request as draft February 10, 2025 17:37

mathause mentioned this pull request Feb 10, 2025

compatibility with xr.DataTree MESMER-group/mesmer#607

Open

3 tasks

fix typing

a23fb44

mathause mentioned this pull request Feb 10, 2025

map_over_datasets throws error on nodes without datasets #9693

Open

Merge branch 'main' into map_over_datasets_skip_empty_nodes

9b1755e

Merge branch 'main' into map_over_datasets_skip_empty_nodes

8b6a816

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

map_over_datasets: skip empty nodes #10042

map_over_datasets: skip empty nodes #10042

mathause commented Feb 10, 2025 •

edited

Loading

Illviljan commented Feb 16, 2025

mathause commented Feb 17, 2025

map_over_datasets: skip empty nodes #10042

Are you sure you want to change the base?

map_over_datasets: skip empty nodes #10042

Conversation

mathause commented Feb 10, 2025 • edited Loading

Illviljan commented Feb 16, 2025

mathause commented Feb 17, 2025

mathause commented Feb 10, 2025 •

edited

Loading