Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[parquet] can't load parquet directory anymore: IsADirectoryError #2159

Closed
mr-majkel opened this issue Dec 7, 2023 · 1 comment · Fixed by #2160
Closed

[parquet] can't load parquet directory anymore: IsADirectoryError #2159

mr-majkel opened this issue Dec 7, 2023 · 1 comment · Fixed by #2160

Comments

@mr-majkel
Copy link
Contributor

mr-majkel commented Dec 7, 2023

Small description

Hi @saulpw @anjakefala @takacsd - it seems that forcing opening the path as file with .open() - introduced with #2133 - breaks the use case where the multiple parquet files are stored in a directory, and this directory is then read by visidata. This is common with Hive partitioning or when working with spark. A simple fix would be to check if the path is a directory with os.path.is_dir() and then retaining old behavior of passing it as a string to read_table(). If it is not an existing directory, we move to the new way of opening as a binary buffer.

I have already added this workaround to my clone of visidata, and it fixes my issue, but maybe you have some better ideas how to handle it instead of if-else statement in the ParquetSheet.

Expected result

vd -f parquet parquet_dir

should load a parquet into visidata

Actual result with screenshot
image

Additional context

# freshest develop
visidata@9fd728b72c115e50e99c24b455caaf020381b48e

pyarrow==12.0.0
python 3.10.2
@saulpw
Copy link
Owner

saulpw commented Dec 7, 2023

This is perfect, thanks for the report and the fix @mr-majkel !

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants