-
Notifications
You must be signed in to change notification settings - Fork 916
[PATHFINDING] Parse json as variant #7403
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
Attn @alamb |
See also the related PR for variant here: |
Thank you for this PR @scovich
In my mind this functionality feels like a "computation kernel" (aka similarly to the functions in https://docs.rs/arrow/latest/arrow/compute/index.html) The signature seems like it would roughly be something like: /// Covert text stored as JSON in an input `StringArray`, `LargeStringArray` or `StringViewArray` into
/// a single "Variant" array (`StructArray` with an extension type)
fn json_to_variant(input: &ArrayRef) -> ArrayRef {
...
} Since the arrow-json crate is currently for converting
I think we will sort this out as part of implementing varint in #6736. TLDR is via a |
I agree something like arrow-compute makes a lot of sense. Unfortunately, the tape decoder machinery is private to arrow-json crate, so I had to do the initial pathfinding here. Is there a better way forward? |
SOme other options might be (not sure which one we should go with):
I have been thinking a lot about how we should introduce variant. What do you think about a structure like this (crates)
I think depending on how arrow-variant is implemented, maybe it depends directly on |
I filed #7423 to track this item |
This is a pathfinding exercise, to see how easy/hard it might be to parse JSON text into parquet's new variant type, using the tape decoder. Not intended to merge, it is more of a conversation starter.
In particular: