Added more descriptive error message if indexing with all False bool … #1197

tammasloughran · 2022-10-04T00:39:31Z

If one tries to load data using an all-False bool array, it returns an unhelpful error message that doesn't quite describe the problem.

import numpy as np
import netCDF4 as nc
ncfile = nc.Dataset('example.nc', 'r')
get_this = np.zeros(ncfile.variables['pastr'].shape[0]).astype(bool)
data =  ncfile.variables['pastr'][get_this]

Traceback (most recent call last):
  File "testing.py", line 5, in <module>
    data =  ncfile.variables['pastr'][get_this]
  File "netCDF4/_netCDF4.pyx", line 4385, in netCDF4._netCDF4.Variable.__getitem__
  File "/usr/lib/python3/dist-packages/netCDF4/utils.py", line 467, in _out_array_shape
    c = count[..., i].ravel()[0] # All elements should be identical.
IndexError: index 0 is out of bounds for axis 0 with size 0

This pull request catches the case where an invalid all-False bool array is used to load data and adds a more descriptive error message.

Cheers,
Tam

…array.

CLAassistant · 2022-10-04T00:39:36Z

All committers have signed the CLA.

jswhit · 2022-10-04T15:44:11Z

Seems like an empty array should be returned, instead of raising an error.

tammasloughran · 2022-10-04T22:01:24Z

My experience has been that 10 times/10, doing this is erroneous, since loading nothing from a file is nonsensical. I would rather it fail fast than silently, since on the rare occasion one does want to load nothing from a file, they can exception handle the error.

jswhit · 2022-10-06T19:15:15Z

My experience has been that 10 times/10, doing this is erroneous, since loading nothing from a file is nonsensical. I would rather it fail fast than silently, since on the rare occasion one does want to load nothing from a file, they can exception handle the error.

That's OK with me, I only say that because that's why numpy does and it might be what many expect.

jswhit

Seems like elem can still be an iterable here, in that case the individual elements of elem need to be checked to see if they are all False boolean areas.

jswhit · 2022-10-07T15:21:44Z

We are already checking for boolean index arrays (in order to convert them to integer index arrays), so how about this:

diff --git a/src/netCDF4/utils.py b/src/netCDF4/utils.py
index c96cc757..dcfeca85 100644
--- a/src/netCDF4/utils.py
+++ b/src/netCDF4/utils.py
@@ -238,6 +238,10 @@ def _StartCountStride(elem, shape, dimensions=None, grp=None, datashape=None,\
             unlim = False
         # convert boolean index to integer array.
         if np.iterable(ea) and ea.dtype.kind =='b':
+            # check that boolean array is not all False.
+            if not ea.any():
+                msg='Boolean index array is all False, at least one element must be True'
+                raise IndexError(msg)
             # check that boolean array not too long
             if not unlim and shape[i] != len(ea):
                 msg="""

Also, please add a test and a Changelog entry

jswhit · 2022-10-10T00:56:40Z

Tests are failing when an all False boolean index array is used on assignment, which should do nothing (but not raise an exception). The fix is to only do the check when put=False.

jswhit · 2022-10-10T22:21:38Z

this should fix it

diff --git a/src/netCDF4/utils.py b/src/netCDF4/utils.py
index c96cc757..dcfeca85 100644
--- a/src/netCDF4/utils.py
+++ b/src/netCDF4/utils.py
@@ -238,6 +238,10 @@ def _StartCountStride(elem, shape, dimensions=None, grp=None, datashape=None,\
             unlim = False
         # convert boolean index to integer array.
         if np.iterable(ea) and ea.dtype.kind =='b':
+            # check that boolean array is not all False when reading.
+            if not put and not ea.any():
+                msg='Boolean index array is all False, at least one element must be True'
+                raise IndexError(msg)
             # check that boolean array not too long
             if not unlim and shape[i] != len(ea):
                 msg="""

jswhit · 2022-10-11T13:02:24Z

I discovered that slicing a 1d variable with an all False boolean index array does return an empty array (consistent with numpy). An exception is raised on when slicing multi-dimensional arrays. PR #1198 ensures that empty arrays are always returned. Given that this is the current behavior for 1d vars, and that's what numpy does, I think this is the preferred solution.

tammasloughran · 2022-10-12T10:09:57Z

There should at least be a warning.

jswhit · 2022-10-12T14:53:13Z

There should at least be a warning.

I am not convinced - curious what others think. Seems to me if you slice with an all False boolean array, you are getting back exactly what you asked for (an empty array).

jswhit · 2022-10-12T16:04:51Z

Plus the fact that this is what numpy does is a pretty big precendent.

jswhit · 2022-10-16T21:04:48Z

Decided to go with PR #1198 instead, based on the discussion at issue #1200

tammasloughran · 2022-10-18T13:17:44Z

Sorry I'm so late returning to this, although it seems I'm too late.

I think we were talking about different problems. I wasn't so much concerned with whether an error should occur or something should be returned. My concern was that enough information was provided to users (both developers and application users) that potential errors aren't obscured and made difficult to diagnose. The boolean array can be determined programmatically, and users may not know exactly what they are asking for or even what's in the .nc file (e.g. when iterating over many files). In this case, a returned empty array would be propagated to far removed parts of user code. A warning would help diagnose indexing errors on that empty array that would eventually occur, and the fundamental behavior would still be consistent with numpy. If users want behavior like this to be erroneous, they can elevate warnings to errors themselves. So it's the best of both worlds.

Lastly, I don't think the warning would be a nuisance to anyone, since until now it had been an error. In fact, returning an empty array silently may break exception handling code for people that relied on this being an error in the past.

Added more descriptive error message if indexing with all False bool …

c30cb71

…array.

jswhit reviewed Oct 6, 2022

View reviewed changes

Moved error message

5e4e903

return empty array if boolean index array is all False

ab787f1

jswhit mentioned this pull request Oct 11, 2022

return empty array if boolean index array is all False #1198

Merged

update

d70c336

jswhit and others added 3 commits October 11, 2022 07:28

add test for all False boolean index slicing multi-dim array

60702d1

Merge remote-tracking branch 'upstream/boolFalse_getitem'

78a93b0

Changed to warning

4f37a43

jswhit mentioned this pull request Oct 12, 2022

what to do when a variable is sliced with an all False Boolean index array? #1200

Closed

jswhit closed this Oct 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added more descriptive error message if indexing with all False bool … #1197

Added more descriptive error message if indexing with all False bool … #1197

tammasloughran commented Oct 4, 2022

CLAassistant commented Oct 4, 2022 •

edited

Loading

jswhit commented Oct 4, 2022

tammasloughran commented Oct 4, 2022

jswhit commented Oct 6, 2022

jswhit left a comment

jswhit commented Oct 7, 2022

jswhit commented Oct 10, 2022

jswhit commented Oct 10, 2022 •

edited

Loading

jswhit commented Oct 11, 2022

tammasloughran commented Oct 12, 2022

jswhit commented Oct 12, 2022 •

edited

Loading

jswhit commented Oct 12, 2022

jswhit commented Oct 16, 2022

tammasloughran commented Oct 18, 2022

Added more descriptive error message if indexing with all False bool … #1197

Added more descriptive error message if indexing with all False bool … #1197

Conversation

tammasloughran commented Oct 4, 2022

CLAassistant commented Oct 4, 2022 • edited Loading

jswhit commented Oct 4, 2022

tammasloughran commented Oct 4, 2022

jswhit commented Oct 6, 2022

jswhit left a comment

Choose a reason for hiding this comment

jswhit commented Oct 7, 2022

jswhit commented Oct 10, 2022

jswhit commented Oct 10, 2022 • edited Loading

jswhit commented Oct 11, 2022

tammasloughran commented Oct 12, 2022

jswhit commented Oct 12, 2022 • edited Loading

jswhit commented Oct 12, 2022

jswhit commented Oct 16, 2022

tammasloughran commented Oct 18, 2022

CLAassistant commented Oct 4, 2022 •

edited

Loading

jswhit commented Oct 10, 2022 •

edited

Loading

jswhit commented Oct 12, 2022 •

edited

Loading