Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

BUG/API: Clarify the behaviour of fillna downcasting #11537

Closed
1 task
sinhrks opened this issue Nov 7, 2015 · 3 comments · Fixed by #54261
Closed
1 task

BUG/API: Clarify the behaviour of fillna downcasting #11537

sinhrks opened this issue Nov 7, 2015 · 3 comments · Fixed by #54261
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@sinhrks
Copy link
Member

sinhrks commented Nov 7, 2015

From #11497. Found inconsistencyr regarding fillna downcasting.

I understand fillna(downcast=None) should downcast values appropriately. It doesn't work on float dtype.

# NG
pd.Series([1, 2, np.nan, 4]).fillna(3)
#0    1
#1    2
#2    3
#3    4
# dtype: float64

# OK
pd.Series([1, 2, np.nan, 4], dtype=object).fillna(3)
#0    1
#1    2
#2    3
#3    4
# dtype: int64

Based on the internal logic, downcast can accept dtype. It works on float, but not on object.

# OK
pd.Series([1, 2, np.nan, 4]).fillna(3, downcast='int')
#0    1
#1    2
#2    3
#3    4
# dtype: int64

# NG
pd.Series([1, 2, np.nan, 4], dtype=object).fillna(3, downcast='int')
#0    1
#1    2
#2    3
#3    4
# dtype: object

I understood the expected behavior as below:

  • all dtypes should be downcast by default (downcast=None)
  • if any dtype is passed via downcast kw, downcast to the specified dtype if possible.
  • downcast=False will not perform downcast.
  • Add Index.fillna to API (follow-up of ENH: Add Index.fillna #11343)
@sinhrks sinhrks added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate API Design labels Nov 7, 2015
@sinhrks sinhrks added this to the 0.17.1 milestone Nov 7, 2015
@jreback
Copy link
Contributor

jreback commented Nov 7, 2015

floats are never downcast by default
it's not cheap to do this

@jreback jreback removed the Bug label Nov 7, 2015
@jreback jreback removed this from the 0.17.1 milestone Nov 7, 2015
@sinhrks
Copy link
Member Author

sinhrks commented Nov 7, 2015

Thanks. Is the following correct? If so, I'll include it to the document in #11497.

  • by default (downcast=None), only object dtype will be downcasted
  • Downcast can be controlled explicitly by:
    • Specifying downcast=False prohibits the downcast
    • Specifying dtype via downcast forces the downcast to the specified type. If unable to downcast, dtype is kept as it is without showing error.

Based on the last condition, this should result in int64?

pd.Series([1, 2, np.nan, 4], dtype=object).fillna(3, downcast='int')
# 0    1
# 1    2
# 2    3
# 3    4
# dtype: object

@jreback
Copy link
Contributor

jreback commented Nov 7, 2015

@sinhrks IIRC that looks right. yeah it is meant for explicit use.

though to be honest maybe we should just take this out and always downcast object dtypes. We downcast for all other operations; the problem is I didn't have a performant way to downcast floats->ints.

@sinhrks sinhrks added this to the 0.17.1 milestone Nov 13, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
@mroeschke mroeschke added Bug and removed API Design labels Apr 20, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants