You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a function (f), which is applied via core.groupby.GroupBy.apply, raises a TypeError, the grouping column (A) is dropped and f is executed again. However, if A is accessed in f, this leads to a confusing stack trace, which can mask the actual error:
Traceback (most recent call last):
...
File "sample.py", line 6, in f
raise TypeError
TypeError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
...
File ".../pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'A'
Of course, if you read it top to bottom, you notice the actual TypeError, and take care of it first. I however read the KeyError trace first and was really confused as to how the column was dropped when I was debugging it (the stack trace is very long in this case). If other errors occur, the exception is simply raised, so this came unexpected. I'm not sure if this behavior at least warrants a note in the documentation. However, I assume this to be an actual bug, because of the code that's responsible for this behavior, which is in pandas/core/groupby/groupby.py, lines 1413 - 1425:
try:
result = self._python_apply_general(f, self._selected_obj)
except TypeError:
# gh-20949
# try again, with .apply acting as a filtering
# operation, by excluding the grouping column
# This would normally not be triggered
# except if the udf is trying an operation that
# fails on *some* columns, e.g. a numeric operation
# on a string grouper column
with self._group_selection_context():
return self._python_apply_general(f, self._selected_obj)
I assume this references issue #20949, based on the tag, but I couldn't really find something in the issue directly addressing this behavior. The comment makes the assumption that this handling is only triggered "if the udf is trying an operation that fails on some columns", which is not the case. I'm not sure this code works as intended. Maybe a better option would be to catch the TypeError more specifically, or to add an option to GroupBy.apply which specifies if the grouping column should be passed to the applied function, or not.
Expected Behavior
Simply raise the occurred TypeError, as it is done with other exceptions.
Installed Versions
pd.show_versions() failed, so here is the output of pd.__version__ and sys.version:
pd.__version__: 1.4.1
sys.version: 3.10.2 ...
The text was updated successfully, but these errors were encountered:
simonjayhawkins
changed the title
BUG: core.groupby.GroupBy.apply unexpected behavior with TypeError
BUG: core.groupby.GroupBy.apply unexpected behavior with TypeError raised in UDF
Jun 10, 2022
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
If a function (
f
), which is applied via core.groupby.GroupBy.apply, raises aTypeError
, the grouping column (A
) is dropped andf
is executed again. However, ifA
is accessed inf
, this leads to a confusing stack trace, which can mask the actual error:Of course, if you read it top to bottom, you notice the actual
TypeError
, and take care of it first. I however read theKeyError
trace first and was really confused as to how the column was dropped when I was debugging it (the stack trace is very long in this case). If other errors occur, the exception is simply raised, so this came unexpected. I'm not sure if this behavior at least warrants a note in the documentation. However, I assume this to be an actual bug, because of the code that's responsible for this behavior, which is inpandas/core/groupby/groupby.py
, lines 1413 - 1425:I assume this references issue #20949, based on the tag, but I couldn't really find something in the issue directly addressing this behavior. The comment makes the assumption that this handling is only triggered "if the udf is trying an operation that fails on some columns", which is not the case. I'm not sure this code works as intended. Maybe a better option would be to catch the
TypeError
more specifically, or to add an option toGroupBy.apply
which specifies if the grouping column should be passed to the applied function, or not.Expected Behavior
Simply raise the occurred
TypeError
, as it is done with other exceptions.Installed Versions
pd.show_versions()
failed, so here is the output ofpd.__version__
andsys.version
:pd.__version__
: 1.4.1sys.version
: 3.10.2 ...The text was updated successfully, but these errors were encountered: