-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
DOC: Closed parameter not intuitively documented in DataFrame.rolling #60485
Comments
Thanks for the report! Agreed there can be many improvements here.
I don't think this is the most natural way to think of the rolling operation. Instead, I recommend using interval notation. In your sum example, consider computing the values for the row indexed by 3 (so the 4th row, with the first being indexed by 0). The window size is 3 and
In each case, I think you'll see that pandas is including the points that fall in this interval. This includes the case of neither, where the points are Note this is where "closed" comes from - the mathematical terminology of having an interval be closed on the left or right. Also note that for each of the intervals above, the size of the interval is indeed 3. |
Hi @rhshadrach, thanks for the mathematical rigorosity added and the explanation of the term 'closed' 😊. However, I would disagree about the window parameter's meaning. I agree with your usage of "interval" but in the docs it states window is size:
And if you think about size, in the So maybe some clarification should be added to avoid this confusion. |
Agreed that the docs need updated. But the size (length) of the interval |
take |
take |
Reopening since #60832 did not address the reason for the confusion, namely that |
@rhshadrach readdressed the missing information about the sizing of the window, let me know if it needs more detail on how closed configurations impact the size of the window. |
Pandas version checks
main
hereLocation of the documentation
https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.rolling.html
Documentation problem
I believe the parameter
closed
is not very intuitively documented.(I'm using Pandas 2.2.2 on a macOS Sequoia)
Window size used for
closed
should bewindow+1
For this parameter to work, the actual window size should be thought of window+1. So for instance when
window=3
this is how closed should be thought:closed='right'
: from a window size of 4 (window=3+1), take the current element and 2 (4-2) elements just before the current one. Totals to 3 elements.closed='left'
: from a window size of 4 (window=3+1), don't take the current element but take the 3 (4-1) elements just before the current one. Totals to 3 elements.closed='both'
: from a window size of 4 (window=3+1), take the current element and 3 (4-1) elements just before the current one. Totals to 4 elements.closed='neither'
Either 'neither' isn't working or what it does isn't straight forward to me. See examples below, both examples return
NaN
in every position.Intuitively I would guess this parameter would diminish the window size by two from a window size of window+1. So if window=3 it would mean the actual calculation would be done in 3+1-2=2 window but as you see below I only get
NaN
.Example 1: mean()
Example 2: sum()
Suggested fix for documentation
I would suggest stating that the window size taken into consideration for
closed
is actually the parameterwindow
+ 1, then what's stated in the docs would make sense. OR, actually use the actualwindow
parameter which would make way much more sense to me. From the current docs:Maybe even add an image example like the ones I posted above.
As for 'neither', I don't have suggestions as I don't fully understand it from my testing.
Finally, I don't like the name
closed
for the parameter, is doesn't mean much to me. I would maybe prefer something likeends
orends_used
. I believe it would be more intuitive.The text was updated successfully, but these errors were encountered: