Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open mfdataset enchancement #9955

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

pratiman-91
Copy link

Added new argument in open_mfdataset to better handle the invalid files.

errors : {'ignore', 'raise', 'warn'}, default 'raise'
        - If 'raise', then invalid dataset will raise an exception.
        - If 'ignore', then invalid dataset will be ignored.
        - If 'warn', then a warning will be issued for each invalid dataset.

Copy link

welcome bot commented Jan 16, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@max-sixty
Copy link
Collaborator

I'm not the expert, but this looks reasonable! Any other thoughts?

Assuming no one thinks it's a bad idea, we would need tests.

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea.

But the way it is implemented here seems overly complicated and repetitive.
I would suggest to revert the logic: first build up the list wrapped in a single try and then handle the three cases in the except block.

pratiman-91 and others added 2 commits January 17, 2025 10:53
Co-authored-by: Michael Niklas  <mick.niklas@gmail.com>
Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

Also, we should add tests for this.

@pratiman-91
Copy link
Author

@headtr1ck Thanks for the suggestions. I have added two tests (ignore and warn). Also, while testing, I found that a new argument broke combine="nested" due to invalid ids. I have now modified it to reflect the correct ids, and it is passing the tests. Please review the tests and the latest version.

@pratiman-91
Copy link
Author

Hi @headtr1ck, I have been thinking about the handling of ids. Current version looks like a patch work (I am not happy with it.). I think we can create ids after removing all the invalid datasets from path1d within the combine==nested block. Please let me know what do you think.
Thanks!

@pratiman-91
Copy link
Author

@max-sixty Can you please go through the PR. Thanks!

@max-sixty
Copy link
Collaborator

I'm admittedly much less familiar with this section of the code. nothing seems wrong though!

I think we should bias towards merging, so if no one has concerns then I'd vote to merge

could we fix the errors in the docs?

@pratiman-91
Copy link
Author

It seems like one test failed test_sparse_dask_dataset_repr (xarray.tests.test_sparse.TestSparseDataArrayAndDataset) . It is not related to this PR.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

better handling of invalid files in open_mfdataset
3 participants