Skip to content

TST: Compression Inference Tests for read_* #17262

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
gfyoung opened this issue Aug 15, 2017 · 9 comments
Closed

TST: Compression Inference Tests for read_* #17262

gfyoung opened this issue Aug 15, 2017 · 9 comments
Labels
IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite
Milestone

Comments

@gfyoung
Copy link
Member

gfyoung commented Aug 15, 2017

xref: #17206 (comment)

cc @dhimmel

@gfyoung gfyoung added IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite labels Aug 15, 2017
@gfyoung gfyoung added this to the 0.21.0 milestone Aug 15, 2017
@dhimmel
Copy link
Contributor

dhimmel commented Aug 15, 2017

See #17206 (comment). In short, not all read_* methods use functionality from io.common. This would be nice, see #15008, but it's a huge task. If we tested compression inference for all read_* methods, many would likely fail... there's lot's of undesirable duplicated functionality across the read code.

@gfyoung
Copy link
Member Author

gfyoung commented Aug 15, 2017

Fair enough. Perhaps you can testing whichever functions hit the io.common path and revise your whatsnew entry to reflect that.

@jreback
Copy link
Contributor

jreback commented Aug 15, 2017

If we tested compression inference for all read_* methods, many would likely fail

@dhimmel but that's exactly the point. I am quite happy to have a comprehensive test, that xfails things that are not converted / implemented.

@dhimmel
Copy link
Contributor

dhimmel commented Aug 16, 2017

Looking through the source code, I believe io.common._infer_compression is only called in the following three places:

io.pickle.to_pickle()

inferred_compression = _infer_compression(path, compression)

io.pickle.read_pickle()

inferred_compression = _infer_compression(path, compression)

io.parsers._read()

compression = _infer_compression(filepath_or_buffer, compression)

The first two are for pickle IO. Tracking down where/how io.parsers._read gets used has been a bit more challenging. Will keep looking into it.

@dhimmel
Copy link
Contributor

dhimmel commented Aug 16, 2017

io.parsers._read is called by io.parsers._make_parser_function:

return _read(filepath_or_buffer, kwds)

_make_parser_function is then used in io.parsers to create read_csv and read_table

pandas/pandas/io/parsers.py

Lines 667 to 671 in a46e5be

read_csv = _make_parser_function('read_csv', sep=',')
read_csv = Appender(_read_csv_doc)(read_csv)
read_table = _make_parser_function('read_table', sep='\t')
read_table = Appender(_read_table_doc)(read_table)

Therefore, I believe, at the current time, _infer_compression is only used by the user-facing IO functions of read_pickle, write_pickle, read_csv, and read_table?

@gfyoung @jreback: should I revise the What's New Entry for #17206 to list these four functions?

- `read_*` methods can now infer compression from non-string paths, such as ``pathlib.Path`` objects (:issue:`17206`).

@gfyoung
Copy link
Member Author

gfyoung commented Aug 16, 2017

@dhimmel : Agreed, part of your PR should be to revise the whatsnew and add tests for any affected functions (and maybe even those that aren't affected but should eventually gain support).

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

@dhimmel any interest in working on this one would be great.

@jreback
Copy link
Contributor

jreback commented Jul 8, 2018

@gfyoung can you evaluate this issue, e.g. tick boxes close, etc.

@gfyoung
Copy link
Member Author

gfyoung commented Jul 8, 2018

This looks good to go! #17900 actually took care of this.

@gfyoung gfyoung closed this as completed Jul 8, 2018
@gfyoung gfyoung modified the milestones: Contributions Welcome, 0.24.0 Jul 8, 2018
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

3 participants