Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Spark] Support RESTORE for clustered table #3194

Merged
merged 4 commits into from
Jun 7, 2024

Conversation

zedtang
Copy link
Collaborator

@zedtang zedtang commented Jun 3, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Support RESTORE for clustered tables by adding a new domain metadata to overwrite the existing one so that clustering columns are correctly restored.

How was this patch tested?

New unit tests.

Does this PR introduce any user-facing changes?

No

@zedtang zedtang self-assigned this Jun 3, 2024
@zedtang zedtang force-pushed the restore-clustered-table branch from f2c8b3f to f22acc0 Compare June 3, 2024 16:17
Copy link
Contributor

@dabao521 dabao521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Let's add more tests

verifyClusteringColumns(tableIdentifier, "a")
}

// Scenario 3: restore unclustered table to clustered version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't look right? All versions below are all clustered table. Are you meaning restore from table with clustering columns to non-empty clustering columns?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is more accurate, updated

@@ -825,6 +826,45 @@ trait ClusteredTableDDLSuiteBase
}
}

test("validate RESTORE on clustered table") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a column mapping enabled variant for these tests? If not let's add one since column names are maninulated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added them to column mapping's selected tests.

verifyClusteringColumns(tableIdentifier, "a")
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the following test :

  1. Restore to latest snapshot: basically it should work for fromSnapshot = toSnapshot

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do a shortcut when fromSnapshot = toSnapshot (src)
So I modified the test to restore to the start version.

@zedtang zedtang requested a review from dabao521 June 4, 2024 04:52
@zedtang zedtang force-pushed the restore-clustered-table branch from bb18a1c to 4dd0c34 Compare June 6, 2024 00:34
@vkorukanti vkorukanti merged commit ef1def9 into delta-io:master Jun 7, 2024
10 checks passed
@zedtang zedtang deleted the restore-clustered-table branch June 7, 2024 22:53
zedtang added a commit to zedtang/delta that referenced this pull request Jun 7, 2024
Support RESTORE for clustered tables by adding a new domain metadata to
overwrite the existing one so that clustering columns are correctly
restored.

New unit tests.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants