Skip to content

Resample convention='start' not functioning properly #15432

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
GrierPhillips opened this issue Feb 16, 2017 · 1 comment · Fixed by #16965
Closed

Resample convention='start' not functioning properly #15432

GrierPhillips opened this issue Feb 16, 2017 · 1 comment · Fixed by #16965
Labels
Docs Resample resample method

Comments

@GrierPhillips
Copy link

GrierPhillips commented Feb 16, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({'values': [2, 3]}, index='1986-01-31', '1986-2-28')
df.resample('M', convention='s').sum()

Problem description

The convention argument does not seem to have any function on start on end when Month is passed as the resample period for a datetime index.

Expected Output

values
1986-01-01 2
1986-02-01 3

Actual Output

values
1986-01-31 2
1986-02-28 3

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: 0.9.2
apiclient: 1.6.1
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

This biggest problem with the convention keyword is that is completely undocumented (also mentioned in #5023 as the general issue about resample's docs).

As a result, I am also not sure what the convention keyword should actually do, but I think it is meant for the case when you upsample a period-based timeseries (based on some tests we have). Eg, in the following case you can see the effect:

In [140]: pts = pd.Series([1, 2], index=pd.period_range('2012-01-01', freq='A', periods=2))

In [142]: pts.resample('M', convention='start').asfreq()
Out[142]: 
2012-01    1.0
2012-02    NaN
2012-03    NaN
2012-04    NaN
2012-05    NaN
2012-06    NaN
2012-07    NaN
2012-08    NaN
2012-09    NaN
2012-10    NaN
2012-11    NaN
2012-12    NaN
2013-01    2.0
2013-02    NaN
2013-03    NaN
2013-04    NaN
2013-05    NaN
2013-06    NaN
2013-07    NaN
2013-08    NaN
2013-09    NaN
2013-10    NaN
2013-11    NaN
2013-12    NaN
Freq: M, dtype: float64

In [143]: pts.resample('M', convention='end').asfreq()
Out[143]: 
2012-12    1.0
2013-01    NaN
2013-02    NaN
2013-03    NaN
2013-04    NaN
2013-05    NaN
2013-06    NaN
2013-07    NaN
2013-08    NaN
2013-09    NaN
2013-10    NaN
2013-11    NaN
2013-12    2.0
Freq: M, dtype: float64

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Docs Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants