Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

DEPR: Disallow dtype inference when setting Index into DataFrame #56102

Merged
merged 3 commits into from
Dec 9, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 21, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @jbrockmendel I think we chatted about this. Series keeps the dtype, so this is mostly for consistency and to make our lives a little easier

@phofl phofl requested a review from jbrockmendel November 21, 2023 22:00
@phofl phofl added the Deprecate Functionality to remove in pandas label Nov 21, 2023
@mroeschke mroeschke added this to the 2.2 milestone Dec 9, 2023
@mroeschke mroeschke merged commit ee6a062 into pandas-dev:main Dec 9, 2023
@mroeschke
Copy link
Member

Thanks @phofl

@phofl phofl deleted the deprecate_setitem_coercing branch December 9, 2023 19:38
# TODO: Remove kludge in sanitize_array for string mode when enforcing
# this deprecation
warnings.warn(
"Setting an Index with object dtype into a DataFrame will no longer "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "will no longer" clear enough that this refers to a future version/deprecation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'll put up a PR to clarify

return sanitize_array(value, self.index, copy=True, allow_2d=True), None
arr = sanitize_array(value, self.index, copy=True, allow_2d=True)
if (
isinstance(value, Index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be more performant to do the Index check before the sanitize_array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We have to sanitise anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you know you have an Index (and no dtype), sanitize_array boils down to:

if len(value) != len(index): raise ...
return value._values

so a lot of sanitize_array may be made unnecessary in this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I want in the future, but we go through maybe_infer_to_datetimelike at the moment which changes the dtype, so we can't shortcut this before we enforce the deprecation

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants