Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

API deprecate date_parser, add date_format #51019

Merged
merged 31 commits into from
Feb 15, 2023

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Jan 27, 2023

To summarise:

  • date_parser is only ever a performance hindrance, and never speeds anything up
  • date_format would be more useful to users, and could actually offer a speedup

So, this PR deprecates date_parser in favour of date_format

@mroeschke
Copy link
Member

Thinking aloud: We may want to keep date_parser around as it could provide a way to not depend on the dateutil parser internally in the future while allowing user to still get similar behavior by passing date_parser=dateutil.parse

@MarcoGorelli
Copy link
Member Author

It would still be faster to do that with apply:

In [2]: timestamp_format = '%Y-%m-%d %H:%M:%S'
   ...: 
   ...: date_index = pd.date_range(start='1900', end='2000')
   ...: 
   ...: dates_df = date_index.strftime(timestamp_format).to_frame(name='ts_col')
   ...: data = dates_df.to_csv()

In [3]: %%timeit
   ...: df = pd.read_csv(io.StringIO(data), parse_dates=['ts_col'], date_parser=du_parse)
   ...: 
   ...: 
1.1 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %%timeit
   ...: df = pd.read_csv(io.StringIO(data))
   ...: df['ts_col'] = df['ts_col'].apply(du_parse)
   ...: 
   ...: 
1.06 s ± 9.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@@ -250,6 +250,16 @@
and pass that; and 3) call `date_parser` once for each row using one or
more strings (corresponding to the columns defined by `parse_dates`) as
arguments.

.. deprecated:: 2.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing in the whatsnew

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add read_excel to the whatsnew?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs tests

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have date_format accept a dictionary (not necessarily in this pr, but as a follow up). I think it is a common use case and also would be similar to how we treat other configuration options

In general, I think deprecating this makes sense. Could add a note in the user guide on how to use the dateutil parser

@MarcoGorelli
Copy link
Member Author

thanks for reviewing - sure, will update the whatsnew, and can address the "accept dict" part in a separate PR

I don't think we want to encourage .apply(du_parse), it was just an example to show that even if to_datetime got rid of dateutil, then users would still be able to use it without needing a date_parser argument

@MarcoGorelli MarcoGorelli requested a review from phofl February 1, 2023 15:07
@jbrockmendel jbrockmendel added the Deprecate Functionality to remove in pandas label Feb 1, 2023
2. If you have a really non-standard format, use a custom ``date_parser`` function.
For optimal performance, this should be vectorized, i.e., it should accept arrays
as arguments.
2. If you different formats for different columns, or want to pass any extra options (such
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to remember to update this when dict support is implemented

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment

@MarcoGorelli
Copy link
Member Author

Sure - got the deprecation showing up with read_excel and read_fwf now too

Got tests

Have opened #51240 as a follow-up to let date_format take a mapping

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a small change to the whatsnew, otherwise lgtm. cc @mroeschke for another look

@MarcoGorelli
Copy link
Member Author

Any objections to getting this in?

@mroeschke mroeschke added this to the 2.0 milestone Feb 15, 2023
@@ -825,7 +826,9 @@ Deprecations
- :meth:`Index.is_numeric` has been deprecated. Use :func:`pandas.api.types.is_any_real_numeric_dtype` instead (:issue:`50042`,:issue:`51152`)
- :meth:`Index.is_categorical` has been deprecated. Use :func:`pandas.api.types.is_categorical_dtype` instead (:issue:`50042`)
- :meth:`Index.is_object` has been deprecated. Use :func:`pandas.api.types.is_object_dtype` instead (:issue:`50042`)
- :meth:`Index.is_interval` has been deprecated. Use :func:`pandas.api.types.is_intterval_dtype` instead (:issue:`50042`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, apologies for having missed that 🤦 thanks for your review

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment otherwise LGTM

@MarcoGorelli
Copy link
Member Author

Thanks for your reviews! Merging then

@MarcoGorelli MarcoGorelli merged commit 047acde into pandas-dev:main Feb 15, 2023
@MarcoGorelli
Copy link
Member Author

I'd like to have date_format accept a dictionary (not necessarily in this pr, but as a follow up).

not a blocker for 2.0, right? don't know if I can get that in fast enough

@mroeschke
Copy link
Member

Yeah would be okay for 2.1 IMO but also fine to accept during the RC period

@MarcoGorelli
Copy link
Member Author

Cool thanks - I'll see how far I get tomorrow morning, maybe it won't be too bad

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API deprecate date_parser, add date_format
5 participants