Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

BUG: core.groupby.GroupBy.apply unexpected behavior with TypeError raised in UDF #46324

Closed
2 of 3 tasks
panda-byte opened this issue Mar 11, 2022 · 1 comment
Closed
2 of 3 tasks
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby

Comments

@panda-byte
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

def f(group):
    print(group)
    print(group['A'])
    raise TypeError

if __name__ == '__main__':
    pd.DataFrame.from_dict({'A': [1], 'B': [3]}).groupby('A').apply(f)

Issue Description

If a function (f), which is applied via core.groupby.GroupBy.apply, raises a TypeError, the grouping column (A) is dropped and f is executed again. However, if A is accessed in f, this leads to a confusing stack trace, which can mask the actual error:

Traceback (most recent call last):
  ...
  File "sample.py", line 6, in f
    raise TypeError
TypeError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  ...
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  ...
  File ".../pandas/core/indexes/base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'A'

Of course, if you read it top to bottom, you notice the actual TypeError, and take care of it first. I however read the KeyError trace first and was really confused as to how the column was dropped when I was debugging it (the stack trace is very long in this case). If other errors occur, the exception is simply raised, so this came unexpected. I'm not sure if this behavior at least warrants a note in the documentation. However, I assume this to be an actual bug, because of the code that's responsible for this behavior, which is in pandas/core/groupby/groupby.py, lines 1413 - 1425:

try:
    result = self._python_apply_general(f, self._selected_obj)
except TypeError:
    # gh-20949
    # try again, with .apply acting as a filtering
    # operation, by excluding the grouping column
    # This would normally not be triggered
    # except if the udf is trying an operation that
    # fails on *some* columns, e.g. a numeric operation
    # on a string grouper column

    with self._group_selection_context():
        return self._python_apply_general(f, self._selected_obj)

I assume this references issue #20949, based on the tag, but I couldn't really find something in the issue directly addressing this behavior. The comment makes the assumption that this handling is only triggered "if the udf is trying an operation that fails on some columns", which is not the case. I'm not sure this code works as intended. Maybe a better option would be to catch the TypeError more specifically, or to add an option to GroupBy.apply which specifies if the grouping column should be passed to the applied function, or not.

Expected Behavior

Simply raise the occurred TypeError, as it is done with other exceptions.

Installed Versions

pd.show_versions() failed, so here is the output of pd.__version__ and sys.version:

  • pd.__version__: 1.4.1
  • sys.version: 3.10.2 ...
@panda-byte panda-byte added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 11, 2022
@mroeschke mroeschke added Groupby Apply Apply, Aggregate, Transform, Map and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 17, 2022
@simonjayhawkins simonjayhawkins changed the title BUG: core.groupby.GroupBy.apply unexpected behavior with TypeError BUG: core.groupby.GroupBy.apply unexpected behavior with TypeError raised in UDF Jun 10, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jun 10, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@topper-123
Copy link
Contributor

This is fixed/resolved in #52477. Closing.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby
Projects
None yet
Development

No branches or pull requests

5 participants