Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

DeduplicateRecords based on recordKey if global index is used #345

Merged
merged 1 commit into from
Mar 22, 2018

Conversation

kaushikd49
Copy link
Contributor

Fixes issue #344

.mapToPair(record -> {
HoodieKey hoodieKey = record.getKey();
// If index used is global, then records are expected to differ in their partitionPath
boolean isGlobal = getIndex() != null && getIndex().isGlobal();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would the index be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected this. The check had to be outside of this code block.

@@ -151,7 +152,7 @@ public HoodieWriteClient(JavaSparkContext jsc, HoodieWriteConfig clientConfig,
config.getUpsertShuffleParallelism());

// perform index loop up to get existing location of records
JavaRDD<HoodieRecord<T>> taggedRecords = index.tagLocation(dedupedRecords, table);
JavaRDD<HoodieRecord<T>> taggedRecords = getIndex().tagLocation(dedupedRecords, table);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove index => getIndex() refactors. its still a private variable that can be accessed within the same class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, made a package private constructor for testing purpose to inject a mock index.

@kaushikd49
Copy link
Contributor Author

@vinothchandar

  • Fixed the null check on index variable. Had to be moved outside of the code block.
  • Removed getIndex(), instead added a package private constructor for testing, to inject a mocked index.

@@ -122,6 +123,20 @@ public HoodieWriteClient(JavaSparkContext jsc, HoodieWriteConfig clientConfig,
}
}

@VisibleForTesting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing HoodieWriteClient constructor may use the new constructor to save duplicate code.

public HoodieWriteClient(JavaSparkContext jsc, HoodieWriteConfig clientConfig,
       boolean rollbackInFlight) {	       boolean rollbackInFlight) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.. @kaushikd49 can we fix that as well..

Defining a new package private constructor is much cleaner.. Nicely done!

@@ -122,6 +123,20 @@ public HoodieWriteClient(JavaSparkContext jsc, HoodieWriteConfig clientConfig,
}
}

@VisibleForTesting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.. @kaushikd49 can we fix that as well..

Defining a new package private constructor is much cleaner.. Nicely done!

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash your commits into 1 as well..

.mapToPair(record -> {
HoodieKey hoodieKey = record.getKey();
// If index used is global, then records are expected to differ in their partitionPath
Object key = isIndexingGlobal ? hoodieKey.getRecordKey() : hoodieKey;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: replace isIndexingGlobal with index.isGlobal()

@kaushikd49
Copy link
Contributor Author

Thanks, Squashed commits @vinothchandar

@vinothchandar vinothchandar merged commit 291a88b into apache:master Mar 22, 2018
vinishjail97 added a commit to vinishjail97/hudi that referenced this pull request Dec 15, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants