-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
API deprecate date_parser, add date_format #51019
API deprecate date_parser, add date_format #51019
Conversation
…s into deprecate-date-parser
Thinking aloud: We may want to keep |
It would still be faster to do that with In [2]: timestamp_format = '%Y-%m-%d %H:%M:%S'
...:
...: date_index = pd.date_range(start='1900', end='2000')
...:
...: dates_df = date_index.strftime(timestamp_format).to_frame(name='ts_col')
...: data = dates_df.to_csv()
In [3]: %%timeit
...: df = pd.read_csv(io.StringIO(data), parse_dates=['ts_col'], date_parser=du_parse)
...:
...:
1.1 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [4]: %%timeit
...: df = pd.read_csv(io.StringIO(data))
...: df['ts_col'] = df['ts_col'].apply(du_parse)
...:
...:
1.06 s ± 9.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
@@ -250,6 +250,16 @@ | |||
and pass that; and 3) call `date_parser` once for each row using one or | |||
more strings (corresponding to the columns defined by `parse_dates`) as | |||
arguments. | |||
|
|||
.. deprecated:: 2.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing in the whatsnew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add read_excel to the whatsnew?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have date_format
accept a dictionary (not necessarily in this pr, but as a follow up). I think it is a common use case and also would be similar to how we treat other configuration options
In general, I think deprecating this makes sense. Could add a note in the user guide on how to use the dateutil parser
thanks for reviewing - sure, will update the whatsnew, and can address the "accept dict" part in a separate PR I don't think we want to encourage |
2. If you have a really non-standard format, use a custom ``date_parser`` function. | ||
For optimal performance, this should be vectorized, i.e., it should accept arrays | ||
as arguments. | ||
2. If you different formats for different columns, or want to pass any extra options (such |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to remember to update this when dict support is implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment
Sure - got the deprecation showing up with Got tests Have opened #51240 as a follow-up to let |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a small change to the whatsnew, otherwise lgtm. cc @mroeschke for another look
Any objections to getting this in? |
doc/source/whatsnew/v2.0.0.rst
Outdated
@@ -825,7 +826,9 @@ Deprecations | |||
- :meth:`Index.is_numeric` has been deprecated. Use :func:`pandas.api.types.is_any_real_numeric_dtype` instead (:issue:`50042`,:issue:`51152`) | |||
- :meth:`Index.is_categorical` has been deprecated. Use :func:`pandas.api.types.is_categorical_dtype` instead (:issue:`50042`) | |||
- :meth:`Index.is_object` has been deprecated. Use :func:`pandas.api.types.is_object_dtype` instead (:issue:`50042`) | |||
- :meth:`Index.is_interval` has been deprecated. Use :func:`pandas.api.types.is_intterval_dtype` instead (:issue:`50042`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, apologies for having missed that 🤦 thanks for your review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment otherwise LGTM
Thanks for your reviews! Merging then |
not a blocker for 2.0, right? don't know if I can get that in fast enough |
Yeah would be okay for 2.1 IMO but also fine to accept during the RC period |
Cool thanks - I'll see how far I get tomorrow morning, maybe it won't be too bad |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.To summarise:
date_parser
is only ever a performance hindrance, and never speeds anything update_format
would be more useful to users, and could actually offer a speedupSo, this PR deprecates
date_parser
in favour ofdate_format