Skip to content

BUG: Unexpected behaviour of rolling with apply on DataFrame #34965

@oXwvdrbbj8S4wo9k8lSN

Description

@oXwvdrbbj8S4wo9k8lSN
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


When executed on a DataFrame, rolling seems to select only certain columns for processing. For demonstration, I created a DataFrame that has three columns (A, B, and C), of which the first contains TimeDeltas and the other contain Floats. When using rolling, e.g. with sum, only the Floats are passed on.
Even stranger, when used in combination with apply, only the first column containing Floats is passed to the function, whereas I would have expected the corresponding part of the DataFrame.

Code Sample, a copy-pastable example

import pandas as pd
columns = ["A", "B", "C"]
index = list(range(10))
data = [[10**10,2,3]]*len(index)
df = pd.DataFrame(columns = columns, index = index, data=data)
df["A"] = df["A"].apply(pd.to_timedelta)

The resulting df will look like this:

         A  B  C
0 00:00:10  2  3
1 00:00:10  2  3
2 00:00:10  2  3
3 00:00:10  2  3
4 00:00:10  2  3
5 00:00:10  2  3
6 00:00:10  2  3
7 00:00:10  2  3
8 00:00:10  2  3
9 00:00:10  2  3

Applying rolling with sum like this:

df.rolling(window=2).sum()

will result in the following output, in which the first column is missing:

     B    C
0  NaN  NaN
1  4.0  6.0
2  4.0  6.0
3  4.0  6.0
4  4.0  6.0
5  4.0  6.0
6  4.0  6.0
7  4.0  6.0
8  4.0  6.0
9  4.0  6.0

To demonstrate the problem with apply, I created a custom function that simply outputs the number of columns (since I expected a DataFrame to be passed to the function:

def get_num_columns(sub_df):
    print(sub_df)
    return len(sub_df.columns)
df.rolling(window=2).apply(get_num_columns, raw=False)

This produces the exception "AttributeError: 'Series' object has no attribute 'columns'" and the following printout:

0    2.0
1    2.0
dtype: float64

Problem description

I would expect in both cases that the windowed DataFrame with all columns is used within the function (either sum or get_num_columns).

Expected Output

In the case of sum, I would either expect an Exception that tells the user that only Floats are acceptable or - preferably - the following output:

         A    B    C
0      NaT  NaN  NaN
1 00:00:20  4.0  6.0
2 00:00:20  4.0  6.0
3 00:00:20  4.0  6.0
4 00:00:20  4.0  6.0
5 00:00:20  4.0  6.0
6 00:00:20  4.0  6.0
7 00:00:20  4.0  6.0
8 00:00:20  4.0  6.0
9 00:00:20  4.0  6.0

In the case of apply, I would have expected a DataFrame as input to the function. Therefore, the output of the function (without the prints) should be:

0    3
1    3
2    3
3    3
4    3
5    3
6    3
7    3
8    3
9    3

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.76-linuxkit
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.3.1.post20200616
Cython : 0.29.20
pytest : 5.4.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.1
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.1
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.6.1
tabulate : None
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.48.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions