Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

DEPR: Remove bytes input for read_excel #53830

Merged
merged 6 commits into from
Jun 27, 2023
Merged

DEPR: Remove bytes input for read_excel #53830

merged 6 commits into from
Jun 27, 2023

Conversation

rmhowe425
Copy link
Contributor

@rmhowe425 rmhowe425 commented Jun 24, 2023

@rmhowe425 rmhowe425 requested a review from rhshadrach as a code owner June 24, 2023 15:14
@rhshadrach rhshadrach added IO Excel read_excel, to_excel Deprecate Functionality to remove in pandas labels Jun 25, 2023
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - a small request

@@ -97,6 +99,9 @@
By file-like object, we refer to objects with a ``read()`` method,
such as a file handle (e.g. via builtin ``open`` function)
or ``StringIO``.

.. deprecated:: 2.1.0
Passing byte strings is deprecated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the "wrap in a BytesIO object" here too

@rhshadrach rhshadrach added this to the 2.1 milestone Jun 25, 2023
Comment on lines 104 to 105
Passing byte strings is deprecated. To read from a "
"byte string, wrap it in a `BytesIO` object.
Copy link
Member

@rhshadrach rhshadrach Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the quotes ("), yea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh shoot thanks! Fixed now!

@rmhowe425 rmhowe425 requested a review from rhshadrach June 26, 2023 16:45
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, otherwise looks good!

Passing byte strings is deprecated. To read from a "
"byte string, wrap it in a `BytesIO` object.
Passing byte strings is deprecated. To read from a
byte string, wrap it in a `BytesIO` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I missed this before. The API docs are autogenerated using sphinx. For sphinx, you need a double backtick here: ``BytesIO``.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; ping on green.

@rmhowe425
Copy link
Contributor Author

@rhshadrach Pinging on green!

@@ -1503,7 +1509,13 @@ def __init__(

# First argument can also be bytes, so create a buffer
if isinstance(path_or_buffer, bytes):
path_or_buffer = BytesIO(path_or_buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To maintain the current behavior during the deprecation, I think we need to continue wrapping this in BytesIO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep! I added this back. Thanks for catching this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that test_excel_read_binary still passes without this line on main. If we are wanting to remove this functionality so that we raise when bytes are passed, I think more needs to be done. But I think this can wait until the deprecation is enforced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach I'm a bit confused on your above message 🤔

The test is passing without the line on main because the input to read_excel is already a BytesIO object. If I left the input as raw binary the test would fail.

Copy link
Member

@rhshadrach rhshadrach Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test in question is

        expected = pd.read_excel("test1" + read_ext, engine=engine)

        with open("test1" + read_ext, "rb") as f:
            data = f.read()

        actual = pd.read_excel(data, engine=engine)
        tm.assert_frame_equal(expected, actual)

data here is a bytes, not a BytesIO, object. This is passed directly to read_excel.

Copy link
Contributor Author

@rmhowe425 rmhowe425 Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach

@mroeschke has been helping me with a similar problem on a few other PRs.

Are you okay with it if I go ahead and take care of this discrepancy before we submit this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure the fix for this problem is just adding more input validation by using a handful of methods from common.py

Copy link
Member

@rhshadrach rhshadrach Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we're okay to deprecate with this PR as is, this can wait until the deprecation is enforced (i.e. the feature of reading bytes is removed in pandas 3.0).

@rmhowe425
Copy link
Contributor Author

@rhshadrach Pinging on green again after I pushed the last time!

@mroeschke mroeschke merged commit 12b114b into pandas-dev:main Jun 27, 2023
@mroeschke
Copy link
Member

Thanks again @rmhowe425

Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
* Adding deprecation logic and updating documentation.

* Adding PR number to new unit test

* Updating read_excel() documentation per reviewer recommendations.

* Updating read_excel() documentation per reviewer recommendations.

* Updating read_excel() documentation per reviewer recommendations.

* Updating implementation per reviewer recommendations.
@rmhowe425 rmhowe425 deleted the dev/depr/literal-bytes-excel branch February 17, 2024 17:20
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Deprecate Functionality to remove in pandas IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DEPR]: Remove literal string/bytes input from read_excel, read_html, and read_xml
3 participants