Skip to content

Fix bug in field level metadata matching code #8286

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 3 commits into from
Nov 21, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 20, 2023

Which issue does this PR close?

Closes #8285

Rationale for this change

Bug fix

What changes are included in this PR?

What was happening is that the input to a join has metadata on one side, but not the other. When the inputs were reordered then metadata was not properly reordered as well

Specifically:

OutputRequirementExec
  ProjectionExec: expr=[id@0 as id]
    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(id@0, id@0)]
      AggregateExec: mode=FinalPartitioned, gby=[id@0 as id], aggr=[]  <--******** this does not have metadata
        AggregateExec: mode=Partial, gby=[id@0 as id], aggr=[]
          UnionExec
            MemoryExec: partitions=1, partition_sizes=[1]
            MemoryExec: partitions=1, partition_sizes=[1]
      MemoryExec: partitions=1, partition_sizes=[1]                    <--******** this has metadata

So the output schema looks like:

  id (metadata)    <-- first input
  id (no metadata) <-- second input

When the join reorders the inputs, the result had the metadata incorrectly set:

  id (no metadata) <-- first input *** Metadata is switched ***
  id (metadata)    <-- second input

There is code in the join reordering that tries to remap the output so the schema is the same, however
it had a bug (was looking up field by name, not column index), so it picked the wrong column.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 20, 2023
.ok()
.map(|f| f.metadata().clone())
// Look up field by index in schema (not NAME)
e.as_any()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix -- look up metadata by column index rather than name

@@ -299,3 +299,31 @@ fn table_with_many_types() -> Arc<dyn TableProvider> {
let provider = MemTable::try_new(Arc::new(schema), vec![vec![batch]]).unwrap();
Arc::new(provider)
}

/// Registers a table_with_metadata that contains both field level and Table level metadata
pub async fn register_metadata_tables(ctx: &SessionContext) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to port several existing metadata tests to sqllogictest as well

@alamb alamb marked this pull request as ready for review November 20, 2023 18:59
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @alamb

e.as_any()
.downcast_ref::<Column>()
.map(|column| column.index())
.map(|idx| input_schema.field(idx).metadata())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can likely be done within a single map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea -- done in f5c557e

@alamb
Copy link
Contributor Author

alamb commented Nov 21, 2023

Thank you for the review @comphead

@alamb alamb merged commit 58483fb into apache:main Nov 21, 2023
@alamb alamb deleted the alamb/join_repro branch November 21, 2023 11:01
alamb added a commit to alamb/datafusion that referenced this pull request Nov 21, 2023
* Fix bug in field level metadata matching code

* improve comment

* single map
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
2 participants