-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[parquet] stringify source to handle both URLs and local paths #1913
Conversation
Stringify the source path that we feed into `pq.read_table`. This produces consistent results for local paths and URLs, where using a `visidata.Path` directly tries to load the underlying `pathlib.Path` object at `_path`.
@@ -16,7 +16,7 @@ def iterload(self): | |||
pq = vd.importExternal('pyarrow.parquet', 'pyarrow') | |||
from visidata.loaders.arrow import arrow_to_vdtype | |||
|
|||
self.tbl = pq.read_table(self.source) | |||
self.tbl = pq.read_table(str(self.source)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may remove the ability to use e.g. parquet files inside .zip files. (Not sure if this works at present either though--depends on if read_table can take a file object).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good call. Just tested locally and that seems to work properly with or without this change 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh! How does that work? :) But if it does, I believe you, and we'll merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh! How does that work? :) But if it does, I believe you, and we'll merge it.
I didn't like how the answer to this was "I have noooo idea" so I started following it and realized I was wrong. It's broken in both cases!
...but my test setup made it look like it worked. Because I just added a parquet file to a zip in the same directory:
├── test.zip
│ ├── benchmark.parquet
benchmark.parquet
So when I ran vd
and opened the zip file, then opened the parquet file inside it, it worked! But under the covers it was transforming the source path into the local path outside the zip. If I moved the uncompressed benchmark.parquet
file elsewhere, things stopped working (with or without this change). Doh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, okay. Thanks for re-checking. In any event, this change doesn't break existing behavior, so when I want to revisit my .parquetz
file idea (in order to have multiple parquet tables in a single file), I'll see if I can fix it without breaking this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, opening this as a separate issue #1916
I'll try to take a look at it since I'm halfway in the hole already, but if anyone else sorts it out first that works too :)
Stringify the source path that we feed into
pq.read_table
. This produces consistent results for local paths and URLs, where using avisidata.Path
directly tries to load the underlyingpathlib.Path
object at_path
.Looks like the issue is that trying to tuck a URL inside a
pathlib
path drops a slash, turninghttp://www...
intohttp:/www...
. So it sort of looks like a URL but sort of like a local path...To reproduce this issue, use the following .vdj:
The URL is invalid but that doesn't really matter, the replay dies with this stacktrace:
Hat tip to https://github.com/spren9er who reported this issue for S3 specifically in ajkerrigan/visidata-plugins#27.