Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

New defaults for concat, merge, combine_* #10062

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jsignell
Copy link
Contributor

Replaces #10051

  • Towards Stricter defaults for concat, combine, open_mfdataset, merge #8778
    • after a sufficiently long deprecation cycle there should be another PR that:
      • removes the option (but doesn't throw if people are still setting it)
      • removes all the FutureWarnings
    • after even more time one last PR that:
      • encodes the new defaults right into the function signature
      • removes the extra warnings that throw on failure
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

This PR attempts to throw warnings if and only if the output would change with the new kwarg defaults. To exercise the new default I am toggling the option back and forth and running the existing test suite.


Run all the tests with use_new_combine_kwarg_defaults=True
270 failed

Change to use_new_combine_kwarg_defaults=False and run the last failed:
268 failed, 2 passed

Those 2 are missed alarms (behavior will be different when using new kwargs and we are not warning). But I am not totally sure whether we need to handle them because they are tests for conditions that used to raise an error and with the new defaults they do not.

FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_do_not_promote - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_error - Failed: DID NOT RAISE <class 'xarray.core.merge.MergeError'>

Running all the tests with use_new_combine_kwarg_defaults=False
352 failed

Change to use_new_combine_kwarg_defaults=True and run the last failed:
268 failed, 86 passed

Those 86 are false alarms

Here is a list of them
xarray/tests/test_backends.py::test_open_mfdataset_list_attr
xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[drop]
xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[override]
xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[no_conflicts]
xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[identical]
xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[drop_conflicts]
xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_attr_by_coords
xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataarray_attr_by_coords
xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_invalid_data_vars_value_should_fail
xarray/tests/test_backends.py::TestDask::test_open_mfdataset
xarray/tests/test_backends.py::TestDask::test_open_mfdataset_2d
xarray/tests/test_backends.py::TestDask::test_open_mfdataset_pathlib
xarray/tests/test_backends.py::TestDask::test_open_mfdataset_2d_pathlib
xarray/tests/test_backends.py::TestDask::test_open_mfdataset_2
xarray/tests/test_backends.py::TestDask::test_attrs_mfdataset
xarray/tests/test_backends.py::TestDask::test_open_mfdataset_attrs_file
xarray/tests/test_backends.py::TestDask::test_open_mfdataset_attrs_file_path
xarray/tests/test_backends.py::TestDask::test_save_mfdataset_roundtrip
xarray/tests/test_backends.py::TestDask::test_save_mfdataset_pathlib_roundtrip
xarray/tests/test_backends.py::TestDask::test_save_mfdataset_compute_false_roundtrip
xarray/tests/test_backends.py::test_h5netcdf_storage_options
xarray/tests/test_combine.py::TestCombineND::test_concat_once[dim1]
xarray/tests/test_combine.py::TestCombineND::test_concat_only_first_dim
xarray/tests/test_combine.py::TestCombineND::test_concat_twice[dim1]
xarray/tests/test_combine.py::TestNestedCombine::test_nested_concat
xarray/tests/test_combine.py::TestNestedCombine::test_concat_multiple_dims
xarray/tests/test_combine.py::TestNestedCombine::test_auto_combine_2d
xarray/tests/test_combine.py::TestNestedCombine::test_auto_combine_2d_combine_attrs_kwarg
xarray/tests/test_combine.py::TestNestedCombine::test_merge_one_dim_concat_another
xarray/tests/test_combine.py::TestCombineDatasetsbyCoords::test_infer_order_from_coords
xarray/tests/test_concat.py::test_concat_categorical
xarray/tests/test_concat.py::test_concat_all_empty
xarray/tests/test_concat.py::TestConcatDataset::test_concat_simple[dim1-True-minimal]
xarray/tests/test_concat.py::TestConcatDataset::test_concat_2[False]
xarray/tests/test_concat.py::TestConcatDataset::test_concat_coords_kwarg[dim1-minimal]
xarray/tests/test_concat.py::TestConcatDataset::test_concat_coords_kwarg[dim1-all]
xarray/tests/test_concat.py::TestConcatDataset::test_concat_coords_kwarg[dim2-minimal]
xarray/tests/test_concat.py::TestConcatDataset::test_concat_coords_kwarg[dim2-all]
xarray/tests/test_concat.py::TestConcatDataset::test_concat
xarray/tests/test_concat.py::TestConcatDataset::test_concat_dim_precedence
xarray/tests/test_concat.py::TestConcatDataset::test_concat_size0
xarray/tests/test_concat.py::TestConcatDataset::test_concat_along_new_dim_multiindex
xarray/tests/test_concat.py::TestConcatDataArray::test_concat_coord_name
xarray/tests/test_dask.py::test_map_blocks_error
xarray/tests/test_dask.py::test_map_blocks[obj0]
xarray/tests/test_dask.py::test_map_blocks[obj1]
xarray/tests/test_dask.py::test_map_blocks_mixed_type_inputs[obj0]
xarray/tests/test_dask.py::test_map_blocks_mixed_type_inputs[obj1]
xarray/tests/test_dask.py::test_map_blocks_convert_args_to_list[obj0]
xarray/tests/test_dask.py::test_map_blocks_convert_args_to_list[obj1]
xarray/tests/test_dask.py::test_map_blocks_add_attrs[obj0]
xarray/tests/test_dask.py::test_map_blocks_add_attrs[obj1]
xarray/tests/test_dask.py::test_map_blocks_change_name
xarray/tests/test_dask.py::test_map_blocks_kwargs[obj0]
xarray/tests/test_dask.py::test_map_blocks_kwargs[obj1]
xarray/tests/test_dask.py::test_map_blocks_to_dataarray
xarray/tests/test_dask.py::test_map_blocks_da_transformations[<lambda>0]
xarray/tests/test_dask.py::test_map_blocks_da_transformations[<lambda>1]
xarray/tests/test_dask.py::test_map_blocks_da_transformations[<lambda>2]
xarray/tests/test_dask.py::test_map_blocks_da_transformations[<lambda>3]
xarray/tests/test_dask.py::test_map_blocks_da_transformations[<lambda>4]
xarray/tests/test_dask.py::test_map_blocks_da_transformations[<lambda>6]
xarray/tests/test_dask.py::test_map_blocks_da_transformations[<lambda>7]
xarray/tests/test_dask.py::test_map_blocks_ds_transformations[<lambda>0]
xarray/tests/test_dask.py::test_map_blocks_ds_transformations[<lambda>1]
xarray/tests/test_dask.py::test_map_blocks_ds_transformations[<lambda>2]
xarray/tests/test_dask.py::test_map_blocks_ds_transformations[<lambda>3]
xarray/tests/test_dask.py::test_map_blocks_ds_transformations[<lambda>4]
xarray/tests/test_dask.py::test_map_blocks_ds_transformations[<lambda>5]
xarray/tests/test_dask.py::test_map_blocks_ds_transformations[<lambda>6]
xarray/tests/test_dask.py::test_map_blocks_ds_transformations[<lambda>7]
xarray/tests/test_dask.py::test_map_blocks_da_ds_with_template[obj1]
xarray/tests/test_dask.py::test_map_blocks_errors_bad_template[obj1]
xarray/tests/test_dask.py::test_map_blocks_object_method[obj0]
xarray/tests/test_dask.py::test_map_blocks_object_method[obj1]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs10-attrs20-expected_attrs0-False]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs11-attrs21-expected_attrs1-False]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs12-attrs22-expected_attrs2-False]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs13-attrs23-expected_attrs3-True]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[drop-attrs14-attrs24-expected_attrs4-False]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[identical-attrs15-attrs25-expected_attrs5-False]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[identical-attrs16-attrs26-expected_attrs6-True]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[override-attrs17-attrs27-expected_attrs7-False]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[drop_conflicts-attrs18-attrs28-expected_attrs8-False]
xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[<lambda>-attrs19-attrs29-expected_attrs9-False]
xarray/tests/test_options.py::TestAttrRetention::test_concat_attr_retention

About half of them are triggered by my attempt to catch cases where different datasets have matching overlapping variables and with compat='no_conflicts' you might be different results than with compat='override'. There might be a better way, but we want to be careful to avoid calling compute.

TODO:
3) Alter existing tests to make sure they still test what they were meant to test by passing in any required kwargs
4) Add new tests that explicitly ensure that for a bunch of different inputs, using old defaults throws a warning OR output is the same with new and old defaults.

Notes

  • I used None to indicate a kwarg value that the user has not explicitly set, but it might be preferable to instead use a special indicator value so you can't actually set the kwargs to None. The benefit of using that approach is that it makes it more obvious that people should not be setting these kwargs to None in their own code. Also with that approach it might be possible to encode both the old and the new default value in the function signature so we don't need to pass around as much context in other kwargs.
  • I added a second set of warnings that tries to catch any errors that might be related to the new default kwargs. The thinking is that these warnings might provide people with some context on any new errors that crop up after they opt in to the new defaults. These only pop up when there is already an error, so they don't catch places where the code succeeds but the result is different than before.

coords="different",
compat="equals",
join="outer",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hard-coded these to the old defaults since there is no way for the user to set them.

@jsignell jsignell force-pushed the concat_default_kwargs branch from 0e65034 to 5461a9f Compare February 24, 2025 20:07
@jsignell
Copy link
Contributor Author

The last test file that I need to work on is test_concat.py

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant