DEPR: concat ignoring empty objects #52532

jbrockmendel · 2023-04-07T23:58:23Z

closes API: concatting of Series/DataFrame - handling (not skipping) of empty objects #39122 (Replace xxxx with the GitHub issue number)
closes Concat empty series with mixed TZ doesn't coerce to object #22186
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

jorisvandenbossche · 2023-04-11T16:23:19Z

pandas/core/dtypes/concat.py

+        if len(non_empties) < len(to_concat) and not any(
+            obj.dtype == _dtype_obj for obj in non_empties
+        ):
+            # Check for object dtype is an imperfect proxy for checking if
+            #  the result dtype is going to change once the deprecation is


Can't we verify this exactly? (checking what the common dtype would be with or without the empties?)

Because I think the check above will easily give false positives (unless I am misreading it): for example, for just a mixture of int types, of which one is empty, will trigger the warning? (since none of them is object dtype) While it will typically not change the resulting dtype (except if the input arrays with the largest bitwidth are all empty)

Worth a try! Will take a look

jorisvandenbossche · 2023-04-11T16:43:40Z

Rereading some of the previous discussions, would you be OK with start doing this deprecation for empty objects that are not object/float dtype? (since those are used as generic "empty" dtype)

jbrockmendel · 2023-04-11T17:15:12Z

Rereading some of the previous discussions, would you be OK with start doing this deprecation for empty objects that are not object/float dtype? (since those are used as generic "empty" dtype)

I'm not dead-set against that, but would like to push down this road a bit further first. IIRC the object/float (in particular float) cases were mostly a sticking point in the all-nan cases more than the empty cases.

MarcoGorelli

looks like something went wrong when merging in the whatsnew

also, this is kind of hard to review, it looks like you're both refactoring and introducing the deprecation warning - any chance to split out the refactoring into a precursor PR please? (or to write a few changes summarising the changes)

doc/source/whatsnew/v2.1.0.rst

jbrockmendel · 2023-05-16T15:31:05Z

also, this is kind of hard to review, it looks like you're both refactoring and introducing the deprecation warning - any chance to split out the refactoring into a precursor PR please? (or to write a few changes summarising the changes)

I think I can refactor out as a precursor, coming up shortly.

pandas/core/dtypes/concat.py

MarcoGorelli

looks good, but I don't understand the is_na_without_isna_all -> is_na_after_size_and_isna_all_deprecation change - could you explain please?

MarcoGorelli · 2023-05-26T15:16:43Z

pandas/core/internals/concat.py

-    def is_na_without_isna_all(self) -> bool:
+    def is_na_after_size_and_isna_all_deprecation(self) -> bool:
+        """
+        Will self.is_na be True after values.size == 0 deprecation and isna_all
+        deprecation are enforced?
+        """
        blk = self.block
        if blk.dtype.kind == "V":
            return True
-        if not blk._can_hold_na:
-            return False
-
-        values = blk.values
-        if values.size == 0:
-            return True


why does this change?

Because the future behavior won't depend on values.size == 0 (note the changed method name/docstring)

sure but the deprecation hasn't been enforced yet, why is this changing already?

this method is for checking on the future behavior to see if we need to issue a warning.

MarcoGorelli

looks fine to me

@jorisvandenbossche any objections?

github-actions · 2023-06-27T00:06:28Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

jbrockmendel · 2023-07-09T20:19:45Z

@jorisvandenbossche @MarcoGorelli gentle ping

MarcoGorelli

Looks good to me

I'd suggest self-merging if it's something you're confident about and have had an approval

jorisvandenbossche · 2023-07-10T20:45:37Z

My apologies for the slow response here, but I would like to get back to this one point (#52532 (comment)):

Rereading some of the previous discussions, would you be OK with start doing this deprecation for empty objects that are not object/float dtype? (since those are used as generic "empty" dtype)

I'm not dead-set against that, but would like to push down this road a bit further first. IIRC the object/float (in particular float) cases were mostly a sticking point in the all-nan cases more than the empty cases.

I am not fully sure how to interpret the "push down this road a bit further first", but so you would not necessarily be against doing this deprecation only for non-object/float dtypes?

(I am also happy to do a follow-up PR for that)

mroeschke · 2023-07-10T22:07:53Z

@jbrockmendel when you have a moment, could you also address this test failure (looks like this PR could have used a rebase)

FAILED pandas/tests/groupby/test_groupby.py::test_indices_concatenation_order - AssertionError: Did not see expected warning of class 'FutureWarning'

jbrockmendel · 2023-07-10T23:49:24Z

when you have a moment, could you also address this test failure (looks like this PR could have used a rebase)

will take a look now.

I am not fully sure how to interpret the "push down this road a bit further first", but so you would not necessarily be against doing this deprecation only for non-object/float dtypes?

IIRC at the time of that comment there were other review comments to address (#52532 (comment)) and i wanted to see how "clean" this PR would be once those were addressed.

Also as I said in that comment, my understanding of the motivation to limit to only non-object/float cases was about the all-NA case, not the empty case. What would the motivation be to special-case the deprecation here?

See pandas-dev/pandas#52532

This change rolls together several small changes to avoid warnings generated when running the tests. Here is a summary of the changes: * Update use of `datetime.utcnow()` deprecated in Python 3.12 * Concatenate pandas dataframes more carefully to avoid `FutureWarning` when concatenating an empty dataframe (see pandas-dev/pandas#52532). * Adjust plotter tests to avoid warnings + Use supported options rather than arbitrary options to avoid unexpected option warning + Catch expected warning when passing an untrained discriminator to `IQPlotter` * Move the setting of the axis scale earlier in the logic of `MplDrawer.format_canvas`. `format_canvas` queries the limits of each `axes` object and uses those values. When no data set on a set of axes (an odd edge case), the default values depend on the scale (because the minimum for linear is 0 but for log is 0.1), so the scale needs to be set first. * Remove transpile options from `FineDragCal` (`_transpiled_circuits` for `BaseCalibrationExperiment` warns about and ignores transpile options) * Replace `isinstance` checks with `pulse_type` checks for calibration tests checking pulse shape * Replace `qiskit.Aer` usage with `qiskit_aer` * Set tolerance on cvxpy test to larger value, close to the tolerance achieved in practice. The routine was maxing out its iterations without achieving tolerance but still producing a close enough result for the test to pass. Increasing the tolerance avoids the max iterations warning and makes the test faster. * Rename `TestLibrary` to `MutableTestLibrary`. The class name starting with `Test*` causes pytest to warn about the class looking like a class holding test cases. pytest is not the main way we run the tests but it still seems best to avoid confusion between test case classes and test helper classes. * Catch warnings about insufficient trials in `QuantumVolume` tests using small numbers of trials. * Catch user and deprecation warnings about calibration saving and loading to csv files. * Convert existing calibration saving and loading tests from json to csv, leaving basic coverage of csv saving. A bug was found with json saving, resulting in one test being marked as an expected failure. * Set data on the `ExperimentData` for `TestFramework.test_run_analysis_experiment_data_pickle_roundtrip` to avoid a warning about an empty experiment data object. Other cases of this warning were addressed in 729014b but this test case was added afterward in a6c9e53. In order to avoid warnings creeping in in the future, this change also makes any warnings emitted by qiskit-experiments code be treated as exceptions by the tests (with an exception for `ScatterTable` methods for which it is planned to remove the deprecation). Any expected warnings in tests can be handled by using `assertWarns` or `warnings.catch_warnings`. As all the changes relate to tests (except the `FineDragCal` one which is a non-functional change to avoid a warning), no release note was added. --------- Co-authored-by: Helena Zhang <Helena.Zhang@ibm.com>

jbrockmendel added 3 commits April 7, 2023 16:57

DEPR: concat with empty objects

63292d4

xfail on 32bit

2ace79c

missing reason

6258adf

mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Deprecate Functionality to remove in pandas labels Apr 10, 2023

jbrockmendel added 2 commits April 10, 2023 14:10

Merge branch 'main' into depr-concat-empty

bfd969f

Fix AM build

51e6d36

jbrockmendel changed the title ~~DEPR: concat with empty objects~~ DEPR: concat ignoring empty objects Apr 10, 2023

post-merge fixup

52ce0d7

jorisvandenbossche reviewed Apr 11, 2023

View reviewed changes

jbrockmendel added 12 commits April 11, 2023 10:17

Merge branch 'main' into depr-concat-empty

f8dc81e

catch more specifically

163bf8a

un-xfail

03a0641

Merge branch 'main' into depr-concat-empty

49a7146

mypy fixup

7e2e995

Merge branch 'main' into depr-concat-empty

7c0c715

Merge branch 'main' into depr-concat-empty

7f2977a

Merge branch 'main' into depr-concat-empty

0eaf359

Merge branch 'main' into depr-concat-empty

a878fea

update test

75d5041

Merge branch 'main' into depr-concat-empty

9e2de8f

Fix broken test

392b40a

MarcoGorelli self-requested a review May 15, 2023 18:24

MarcoGorelli requested changes May 16, 2023

View reviewed changes

doc/source/whatsnew/v2.1.0.rst Outdated Show resolved Hide resolved

jbrockmendel added 2 commits May 16, 2023 08:26

Merge branch 'main' into depr-concat-empty

465c141

remove duplicate whatsnew entries

3666bca

jbrockmendel mentioned this pull request May 16, 2023

REF: split out dtype-finding in concat_compat #53260

Merged

jbrockmendel added 3 commits May 24, 2023 08:43

Merge branch 'main' into depr-concat-empty

1277b26

Merge branch 'main' into depr-concat-empty

5cddae9

Merge branch 'main' into depr-concat-empty

8e58bff

MarcoGorelli reviewed May 25, 2023

View reviewed changes

pandas/core/dtypes/concat.py Outdated Show resolved Hide resolved

MarcoGorelli self-requested a review May 25, 2023 18:35

jbrockmendel added 2 commits May 25, 2023 13:04

Merge branch 'main' into depr-concat-empty

47a17b3

remove unused

e696c53

MarcoGorelli requested changes May 26, 2023

View reviewed changes

Merge branch 'main' into depr-concat-empty

7f07121

MarcoGorelli approved these changes May 27, 2023

View reviewed changes

github-actions bot added the Stale label Jun 27, 2023

MarcoGorelli approved these changes Jul 10, 2023

View reviewed changes

MarcoGorelli added this to the 2.1 milestone Jul 10, 2023

MarcoGorelli merged commit 76854ce into pandas-dev:main Jul 10, 2023

github-actions bot mentioned this pull request Jul 7, 2023

DEPR: List of deprecations to be removed in 3.0 #50578

Open

jbrockmendel deleted the depr-concat-empty branch July 10, 2023 23:48

bednar mentioned this pull request Oct 26, 2023

Pandas outputs warning when calling dataframe.append in flux_csv_parser._prepare_data_frame influxdata/influxdb-client-python#613

Open

wshanks added a commit to wshanks/qiskit-experiments that referenced this pull request Nov 29, 2023

Avoid pandas FutureWarning on concatenating empty data frame

80a537f

See pandas-dev/pandas#52532

wshanks added a commit to wshanks/qiskit-experiments that referenced this pull request Dec 20, 2023

Avoid pandas FutureWarning on concatenating empty data frame

d11bb17

See pandas-dev/pandas#52532

wshanks mentioned this pull request Dec 20, 2023

Address several warnings raised during tests qiskit-community/qiskit-experiments#1351

Merged

wshanks added a commit to wshanks/qiskit-experiments that referenced this pull request Dec 20, 2023

Avoid pandas FutureWarning on concatenating empty data frame

449634a

See pandas-dev/pandas#52532

crusaderky mentioned this pull request Feb 16, 2024

combine_first: conditional type-cast to rhs's dtype dask/dask#10931

Open

rhshadrach mentioned this pull request Dec 31, 2024

TST(string dtype): Remove xfail for combine_first #60634

Merged

5 tasks

jorisvandenbossche mentioned this pull request Jan 3, 2025

API: creating DataFrame with no columns: object vs string dtype columns? #60338

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: concat ignoring empty objects #52532

DEPR: concat ignoring empty objects #52532

jbrockmendel commented Apr 7, 2023 •

edited

Loading

jorisvandenbossche Apr 11, 2023 •

edited

Loading

jbrockmendel Apr 11, 2023

jorisvandenbossche commented Apr 11, 2023

jbrockmendel commented Apr 11, 2023

MarcoGorelli left a comment

jbrockmendel commented May 16, 2023

MarcoGorelli left a comment

MarcoGorelli May 26, 2023

jbrockmendel May 26, 2023

MarcoGorelli May 27, 2023

jbrockmendel May 27, 2023

MarcoGorelli left a comment •

edited

Loading

github-actions bot commented Jun 27, 2023

jbrockmendel commented Jul 9, 2023

MarcoGorelli left a comment

jorisvandenbossche commented Jul 10, 2023

mroeschke commented Jul 10, 2023

jbrockmendel commented Jul 10, 2023

DEPR: concat ignoring empty objects #52532

DEPR: concat ignoring empty objects #52532

Conversation

jbrockmendel commented Apr 7, 2023 • edited Loading

jorisvandenbossche Apr 11, 2023 • edited Loading

Choose a reason for hiding this comment

jbrockmendel Apr 11, 2023

Choose a reason for hiding this comment

jorisvandenbossche commented Apr 11, 2023

jbrockmendel commented Apr 11, 2023

MarcoGorelli left a comment

Choose a reason for hiding this comment

jbrockmendel commented May 16, 2023

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli May 26, 2023

Choose a reason for hiding this comment

jbrockmendel May 26, 2023

Choose a reason for hiding this comment

MarcoGorelli May 27, 2023

Choose a reason for hiding this comment

jbrockmendel May 27, 2023

Choose a reason for hiding this comment

MarcoGorelli left a comment • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Jun 27, 2023

jbrockmendel commented Jul 9, 2023

MarcoGorelli left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Jul 10, 2023

mroeschke commented Jul 10, 2023

jbrockmendel commented Jul 10, 2023

jbrockmendel commented Apr 7, 2023 •

edited

Loading

jorisvandenbossche Apr 11, 2023 •

edited

Loading

MarcoGorelli left a comment •

edited

Loading