-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[HUDI-5008] Avoid unset HoodieROTablePathFilter in IncrementalRelation #6921
Conversation
@@ -167,12 +167,15 @@ class TestCOWDataSourceStorage extends SparkClientFunctionalTestHarness { | |||
// Read Incremental Query | |||
// we have 2 commits, try pulling the first commit (which is not the latest) | |||
val firstCommit = HoodieDataSourceHelpers.listCommitsSince(fs, basePath, "000").get(0) | |||
// Setting HoodieROTablePathFilter here to test whether pathFilter can filter out correctly for IncrementalRelation | |||
spark.sparkContext.hadoopConfiguration.set("mapreduce.input.pathFilter.class", "org.apache.hudi.hadoop.HoodieROTablePathFilter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here fix the test added: #458
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we setting a filter from the test? Isn't it supposed to be set by the Relation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now. Thanks for clarifying!
Shouldn't we write the test that would set HoodieROTablePathFilter
(by using globbing for ex)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, there is a test in org.apache.hudi.functional.TestCOWDataSource#testReadPathsOnCopyOnWriteTable
Line 341 in 7e7b3a8
.option(DataSourceReadOptions.READ_PATHS.key, record1FilePaths) |
toHadoopRelation
will add the HoodieROTablePathFilter
, and this test also contains old version files.
Hi @alexeykudinkin could you plz help to review this pr? |
Gentle ping @alexeykudinkin |
@@ -167,12 +167,15 @@ class TestCOWDataSourceStorage extends SparkClientFunctionalTestHarness { | |||
// Read Incremental Query | |||
// we have 2 commits, try pulling the first commit (which is not the latest) | |||
val firstCommit = HoodieDataSourceHelpers.listCommitsSince(fs, basePath, "000").get(0) | |||
// Setting HoodieROTablePathFilter here to test whether pathFilter can filter out correctly for IncrementalRelation | |||
spark.sparkContext.hadoopConfiguration.set("mapreduce.input.pathFilter.class", "org.apache.hudi.hadoop.HoodieROTablePathFilter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now. Thanks for clarifying!
Shouldn't we write the test that would set HoodieROTablePathFilter
(by using globbing for ex)
Change Logs
If users create an incrementalRelation while join another existing hive hudi table, as pathFilter is unset inside incrementalRelation, all files under hive hudi table will be selected.
Now HoodieROTablePathFilter can accept as.of.instant to do the time travel, so instead we pass as.of.instant to the dataframe(not change spark hadoop conf globally) to avoid this issue.
Impact
Risk level: low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist