Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

StructTypes in a VirtualTableScan #254

Closed
Blizzara opened this issue May 2, 2024 · 1 comment · Fixed by #255
Closed

StructTypes in a VirtualTableScan #254

Blizzara opened this issue May 2, 2024 · 1 comment · Fixed by #255

Comments

@Blizzara
Copy link
Contributor

Blizzara commented May 2, 2024

Hey! I've been working on a Spark to Substrait (and back) converter, forked from https://github.com/apache/incubator-gluten/tree/v1.1.0/substrait/substrait-spark / https://github.com/substrait-io/substrait-java/pull/90/files (currently the fork is private but I hope to open source it too)

While aiming to support struct types in LocalRelations/VirtualTables, I ran into the check here:

&& rows.stream().noneMatch(r -> r == null || r.fields().size() != names.size());

If I've understood right, the "names" here should be a flattened list of field names, including column names but also recursively all names from struct types. This also aligns with the code in Isthmus. Overall, that means the check r.fields().size() != names.size() will trigger since there will be more names than top-level fields.

I'm pretty new to Substrait still so I may also be mistaken, but if my understanding is right, would it make sense to either:
a) remove the check,
b) change the check to confirm that names.size() >= r.fields().size(), or
c) iterate through fields to count the sub-fields as well before comparing the sizes?

@vbarua
Copy link
Member

vbarua commented May 2, 2024

If I've understood right, the "names" here should be a flattened list of field names, including column names but also recursively all names from struct types.

I think that's correct, and

Overall, that means the check r.fields().size() != names.size() will trigger since there will be more names than top-level fields.

this is probably a bug.

Weakening the check like you suggest in b makes sense or making it more comprehensive like in c both sound reasonable to me. It would be good to add a test for this case as well to avoid regressing to this behaviour.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants