API deprecate date_parser, add date_format #51019

MarcoGorelli · 2023-01-27T11:07:05Z

closes API deprecate date_parser, add date_format #50601 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

To summarise:

date_parser is only ever a performance hindrance, and never speeds anything up
date_format would be more useful to users, and could actually offer a speedup

So, this PR deprecates date_parser in favour of date_format

pandas/tests/io/parser/test_parse_dates.py

…s into deprecate-date-parser

mroeschke · 2023-01-27T17:56:59Z

Thinking aloud: We may want to keep date_parser around as it could provide a way to not depend on the dateutil parser internally in the future while allowing user to still get similar behavior by passing date_parser=dateutil.parse

MarcoGorelli · 2023-01-27T18:03:27Z

It would still be faster to do that with apply:

In [2]: timestamp_format = '%Y-%m-%d %H:%M:%S'
   ...: 
   ...: date_index = pd.date_range(start='1900', end='2000')
   ...: 
   ...: dates_df = date_index.strftime(timestamp_format).to_frame(name='ts_col')
   ...: data = dates_df.to_csv()

In [3]: %%timeit
   ...: df = pd.read_csv(io.StringIO(data), parse_dates=['ts_col'], date_parser=du_parse)
   ...: 
   ...: 
1.1 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %%timeit
   ...: df = pd.read_csv(io.StringIO(data))
   ...: df['ts_col'] = df['ts_col'].apply(du_parse)
   ...: 
   ...: 
1.06 s ± 9.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

phofl · 2023-01-27T22:33:31Z

pandas/io/excel/_base.py

@@ -250,6 +250,16 @@
    and pass that; and 3) call `date_parser` once for each row using one or
    more strings (corresponding to the columns defined by `parse_dates`) as
    arguments.
+
+  .. deprecated:: 2.0.0


This is missing in the whatsnew

Can you add read_excel to the whatsnew?

Also needs tests

phofl

I'd like to have date_format accept a dictionary (not necessarily in this pr, but as a follow up). I think it is a common use case and also would be similar to how we treat other configuration options

In general, I think deprecating this makes sense. Could add a note in the user guide on how to use the dateutil parser

MarcoGorelli · 2023-01-30T11:07:10Z

thanks for reviewing - sure, will update the whatsnew, and can address the "accept dict" part in a separate PR

I don't think we want to encourage .apply(du_parse), it was just an example to show that even if to_datetime got rid of dateutil, then users would still be able to use it without needing a date_parser argument

doc/source/whatsnew/v0.24.0.rst

phofl · 2023-02-01T21:45:42Z

doc/source/user_guide/io.rst

-2. If you have a really non-standard format, use a custom ``date_parser`` function.
-   For optimal performance, this should be vectorized, i.e., it should accept arrays
-   as arguments.
+2. If you different formats for different columns, or want to pass any extra options (such


Have to remember to update this when dict support is implemented

phofl

small comment

MarcoGorelli · 2023-02-08T16:08:06Z

Sure - got the deprecation showing up with read_excel and read_fwf now too

Got tests

Have opened #51240 as a follow-up to let date_format take a mapping

phofl

small comments

doc/source/whatsnew/v2.0.0.rst

pandas/io/parsers/base_parser.py

phofl

Made a small change to the whatsnew, otherwise lgtm. cc @mroeschke for another look

MarcoGorelli · 2023-02-15T09:33:57Z

Any objections to getting this in?

mroeschke · 2023-02-15T18:33:41Z

doc/source/whatsnew/v2.0.0.rst

@@ -825,7 +826,9 @@ Deprecations
 - :meth:`Index.is_numeric` has been deprecated. Use :func:`pandas.api.types.is_any_real_numeric_dtype` instead (:issue:`50042`,:issue:`51152`)
 - :meth:`Index.is_categorical` has been deprecated. Use :func:`pandas.api.types.is_categorical_dtype` instead (:issue:`50042`)
 - :meth:`Index.is_object` has been deprecated. Use :func:`pandas.api.types.is_object_dtype` instead (:issue:`50042`)
+- :meth:`Index.is_interval` has been deprecated. Use :func:`pandas.api.types.is_intterval_dtype` instead (:issue:`50042`)


I think this needs to be removed

yup, apologies for having missed that 🤦 thanks for your review

mroeschke

One comment otherwise LGTM

MarcoGorelli · 2023-02-15T21:48:11Z

Thanks for your reviews! Merging then

MarcoGorelli · 2023-02-15T21:49:31Z

I'd like to have date_format accept a dictionary (not necessarily in this pr, but as a follow up).

not a blocker for 2.0, right? don't know if I can get that in fast enough

mroeschke · 2023-02-15T22:07:12Z

Yeah would be okay for 2.1 IMO but also fine to accept during the RC period

MarcoGorelli · 2023-02-15T22:18:47Z

Cool thanks - I'll see how far I get tomorrow morning, maybe it won't be too bad

MarcoGorelli added 4 commits January 27, 2023 10:43

wip

f2d9eb9

fixup

ab53530

update user guide

c855c4b

whatsnew

389dd71

MarcoGorelli commented Jan 27, 2023

View reviewed changes

pandas/tests/io/parser/test_parse_dates.py Outdated Show resolved Hide resolved

gh number

2e6fbcb

MarcoGorelli requested review from WillAyd, mroeschke and jorisvandenbossche January 27, 2023 11:50

MarcoGorelli added 3 commits January 27, 2023 14:04

update user guide;

fd5e7c6

Merge branch 'deprecate-date-parser' of github.com:MarcoGorelli/panda…

1316559

…s into deprecate-date-parser

whatsnew

9594c04

phofl reviewed Jan 27, 2023

View reviewed changes

MarcoGorelli and others added 3 commits January 30, 2023 13:32

Merge remote-tracking branch 'upstream/main' into deprecate-date-parser

7e43e3d

update whatsnew note with date_format enhancement

d9f35f9

Merge branch 'main' into deprecate-date-parser

39cd663

MarcoGorelli requested a review from phofl February 1, 2023 15:07

jbrockmendel added the Deprecate Functionality to remove in pandas label Feb 1, 2023

mroeschke reviewed Feb 1, 2023

View reviewed changes

doc/source/whatsnew/v0.24.0.rst Outdated Show resolved Hide resolved

phofl reviewed Feb 1, 2023

View reviewed changes

MarcoGorelli and others added 5 commits February 2, 2023 08:52

Merge remote-tracking branch 'upstream/main' into deprecate-date-parser

a23f101

make example ipython code-block

5654f91

Merge branch 'main' into deprecate-date-parser

532080c

Merge remote-tracking branch 'upstream/main' into deprecate-date-parser

4b5cf56

add tests for date_format

f627ece

MarcoGorelli mentioned this pull request Feb 6, 2023

ENH: Expose date parsing arguments in read_html function #49553

Open

3 tasks

MarcoGorelli added 5 commits February 7, 2023 15:27

Merge remote-tracking branch 'upstream/main' into deprecate-date-parser

66e8f2e

Merge remote-tracking branch 'upstream/main' into deprecate-date-parser

f997edd

validate within _parser

a7d496b

Merge remote-tracking branch 'upstream/main' into deprecate-date-parser

5d18cd1

minor fixup

4a233d7

MarcoGorelli mentioned this pull request Feb 8, 2023

ENH: let date_format take a mapping of columns to formats #51240

Closed

3 tasks

MarcoGorelli added 2 commits February 8, 2023 18:28

Merge remote-tracking branch 'upstream/main' into deprecate-date-parser

63eff32

mention other readers in whatsnew, None -> no_default

5737c70

phofl requested changes Feb 8, 2023

View reviewed changes

doc/source/whatsnew/v2.0.0.rst Outdated Show resolved Hide resolved

pandas/io/parsers/base_parser.py Outdated Show resolved Hide resolved

Update v2.0.0.rst

ca0db5c

phofl approved these changes Feb 10, 2023

View reviewed changes

phofl and others added 2 commits February 10, 2023 13:40

Merge branch 'main' into deprecate-date-parser

ab8b537

Merge branch 'main' into deprecate-date-parser

f1b6940

MarcoGorelli requested a review from mroeschke February 15, 2023 09:33

MarcoGorelli mentioned this pull request Feb 15, 2023

read_csv(parse_dates=True) fails to parse datetimes with timezone offset pola-rs/polars#6886

Closed

2 tasks

MarcoGorelli added 2 commits February 15, 2023 17:53

fixup merge conflict resolution

e9ce938

Merge remote-tracking branch 'upstream/main' into deprecate-date-parser

2a0cafd

mroeschke added this to the 2.0 milestone Feb 15, 2023

mroeschke reviewed Feb 15, 2023

View reviewed changes

MarcoGorelli merged commit 047acde into pandas-dev:main Feb 15, 2023

This was referenced May 7, 2024

CLN: Remove deprecated read_*(date_parser=) #58624

Merged

BUG: read_csv with date_parser lock file open on failure #15302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API deprecate date_parser, add date_format #51019

API deprecate date_parser, add date_format #51019

MarcoGorelli commented Jan 27, 2023 •

edited

Loading

mroeschke commented Jan 27, 2023

MarcoGorelli commented Jan 27, 2023

phofl Jan 27, 2023

phofl Feb 2, 2023

phofl Feb 2, 2023

phofl left a comment

MarcoGorelli commented Jan 30, 2023

phofl Feb 1, 2023

phofl left a comment

MarcoGorelli commented Feb 8, 2023

phofl left a comment

phofl left a comment

MarcoGorelli commented Feb 15, 2023

mroeschke Feb 15, 2023

MarcoGorelli Feb 15, 2023

mroeschke left a comment

MarcoGorelli commented Feb 15, 2023

MarcoGorelli commented Feb 15, 2023

mroeschke commented Feb 15, 2023

MarcoGorelli commented Feb 15, 2023

API deprecate date_parser, add date_format #51019

API deprecate date_parser, add date_format #51019

Conversation

MarcoGorelli commented Jan 27, 2023 • edited Loading

mroeschke commented Jan 27, 2023

MarcoGorelli commented Jan 27, 2023

phofl Jan 27, 2023

Choose a reason for hiding this comment

phofl Feb 2, 2023

Choose a reason for hiding this comment

phofl Feb 2, 2023

Choose a reason for hiding this comment

phofl left a comment

Choose a reason for hiding this comment

MarcoGorelli commented Jan 30, 2023

phofl Feb 1, 2023

Choose a reason for hiding this comment

phofl left a comment

Choose a reason for hiding this comment

MarcoGorelli commented Feb 8, 2023

phofl left a comment

Choose a reason for hiding this comment

phofl left a comment

Choose a reason for hiding this comment

MarcoGorelli commented Feb 15, 2023

mroeschke Feb 15, 2023

Choose a reason for hiding this comment

MarcoGorelli Feb 15, 2023

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

MarcoGorelli commented Feb 15, 2023

MarcoGorelli commented Feb 15, 2023

mroeschke commented Feb 15, 2023

MarcoGorelli commented Feb 15, 2023

MarcoGorelli commented Jan 27, 2023 •

edited

Loading