StructTypes in a VirtualTableScan #254

Blizzara · 2024-05-02T11:18:54Z

Hey! I've been working on a Spark to Substrait (and back) converter, forked from https://github.com/apache/incubator-gluten/tree/v1.1.0/substrait/substrait-spark / https://github.com/substrait-io/substrait-java/pull/90/files (currently the fork is private but I hope to open source it too)

While aiming to support struct types in LocalRelations/VirtualTables, I ran into the check here:

substrait-java/core/src/main/java/io/substrait/relation/VirtualTableScan.java

Line 33 in 72bab63

    
           && rows.stream().noneMatch(r -> r == null || r.fields().size() != names.size());

If I've understood right, the "names" here should be a flattened list of field names, including column names but also recursively all names from struct types. This also aligns with the code in Isthmus. Overall, that means the check r.fields().size() != names.size() will trigger since there will be more names than top-level fields.

I'm pretty new to Substrait still so I may also be mistaken, but if my understanding is right, would it make sense to either:
a) remove the check,
b) change the check to confirm that names.size() >= r.fields().size(), or
c) iterate through fields to count the sub-fields as well before comparing the sizes?

The text was updated successfully, but these errors were encountered:

vbarua · 2024-05-02T22:06:41Z

If I've understood right, the "names" here should be a flattened list of field names, including column names but also recursively all names from struct types.

I think that's correct, and

Overall, that means the check r.fields().size() != names.size() will trigger since there will be more names than top-level fields.

this is probably a bug.

Weakening the check like you suggest in b makes sense or making it more comprehensive like in c both sound reasonable to me. It would be good to add a test for this case as well to avoid regressing to this behaviour.

Blizzara mentioned this issue May 3, 2024

fix: account for struct fields in VirtualTableScan check #255

Merged

vbarua closed this as completed in #255 May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StructTypes in a VirtualTableScan #254

StructTypes in a VirtualTableScan #254

Blizzara commented May 2, 2024

vbarua commented May 2, 2024

StructTypes in a VirtualTableScan #254

StructTypes in a VirtualTableScan #254

Comments

Blizzara commented May 2, 2024

vbarua commented May 2, 2024