Skip to content

REGR: change in output of groupby.apply in 1.3.2 -> 1.3.3 #43568

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

From dask/dask#8137

One of the corner cases in groupby apply (discussed in other issues like #34998) has changed behaviour:

import pandas as pd
df = pd._testing.makeTimeDataFrame()
df.groupby(df.index.month).apply(lambda x: x.drop_duplicates())

In pandas 1.3.2 this gives:

                     A         B         C         D
1 2000-01-03 -1.261522 -0.200411 -0.746305 -0.661842
  2000-01-04  1.248556  0.256573 -1.401839  0.508941
  2000-01-05 -0.036682  0.109758 -0.759474 -0.601479
  2000-01-06 -0.778714 -0.932903 -0.544291 -0.986584
  2000-01-07 -0.567276  0.744036  1.981160  0.222988
  2000-01-10 -0.448542  0.221418  0.485706 -1.142561
  2000-01-11 -0.125143 -1.393151  0.879428  0.011297
  2000-01-12  0.550527 -1.373356  1.149654 -0.575065
  2000-01-13  0.269579 -0.017307 -1.023269  0.738274
  2000-01-14  0.757235 -1.875451 -0.751026  0.812741
  2000-01-17  2.456978  0.992319  0.945757 -2.468437
  2000-01-18 -2.132953  1.210491  0.150581 -1.861079
  2000-01-19  0.500947 -0.861651  0.412729 -1.274573
  2000-01-20  0.215823 -0.502341 -0.060564  0.439930
  2000-01-21  0.454649  1.188960  1.167487 -0.087031
  2000-01-24 -1.194599  0.709980  1.927664 -1.868195
  2000-01-25 -1.465017  1.187098  0.262209  0.312123
  2000-01-26 -0.010187 -0.624253 -0.186090 -0.126192
  2000-01-27 -0.520074 -0.189463 -0.379236 -0.259591
  2000-01-28 -1.179406 -0.169766 -1.731189  0.583444
  2000-01-31  0.677903  0.845305 -0.282444  0.807889
2 2000-02-01 -1.561435  2.068383 -0.500742 -0.040578
  2000-02-02 -1.357882  1.302612  1.105816 -0.688315
  2000-02-03  0.604455  0.637055 -0.296199  0.699753
  2000-02-04  0.102784 -0.786359 -0.598806  0.604410
  2000-02-07  0.977317  0.530884 -0.880909 -0.963008
  2000-02-08  1.948681  0.065753 -0.530815 -1.688043
  2000-02-09  1.461333  1.105021  1.039801 -0.144059
  2000-02-10  2.105116 -1.121452  0.076824 -0.334885
  2000-02-11  0.984281  0.858620  1.602277 -0.421881

while in pandas 1.3.3 it gives:

                   A         B         C         D
2000-01-03  0.131780  0.079102 -2.631289  0.969882
2000-01-04  0.381887  0.177194 -0.031367 -1.062184
2000-01-05 -1.299994  0.951530  0.806066  1.043698
2000-01-06 -0.669137  1.036442  0.762052 -0.475059
2000-01-07  0.498415 -0.511591 -0.500675  0.098846
2000-01-10 -1.313268  0.511975 -0.935800 -0.371694
2000-01-11  1.812837 -0.017126 -0.748976  1.217975
2000-01-12  0.236695  0.012316  0.319136  0.743945
2000-01-13 -1.128511  0.367611 -0.240936 -0.847221
2000-01-14 -2.170718  1.349021 -1.205040  1.210471
2000-01-17  0.220773  1.238868 -0.208188 -0.240763
2000-01-18 -0.949992  0.273480  0.863710  2.446306
2000-01-19  0.622379  1.386699 -1.181249  0.188620
2000-01-20 -1.340407 -0.523331 -1.794468  0.877138
2000-01-21  0.029993 -0.115333  0.358685  1.652006
2000-01-24  1.209907 -1.354522  0.883701  0.686492
2000-01-25 -0.840201  1.415816  0.396826  1.342700
2000-01-26  1.206150 -0.114443  0.011106  0.995629
2000-01-27 -0.505894  0.500736  0.004411  0.807632
2000-01-28  0.117852 -0.411066  1.315072  0.731249
2000-01-31 -0.329046 -1.921455  2.564603 -0.222591
2000-02-01  0.295899 -0.169977  0.162310 -0.554688
2000-02-02 -1.144224  0.530313 -0.530216  0.287826
2000-02-03  1.491748 -1.051309  1.414135 -0.332648
2000-02-04 -0.452243 -0.087787 -0.308278  0.681506
2000-02-07 -1.041728 -0.202066  0.044722  0.665914
2000-02-08  0.712994  1.547563  2.557823 -0.801031
2000-02-09  0.396298 -1.325411 -0.926420  0.738052
2000-02-10  0.460109  0.734418  0.416767 -1.199427
2000-02-11 -0.104655 -0.440354 -0.787402  0.357853

(note the difference in index)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions