Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fixing behavior of buffering in Create/Merge handles for invalid/wrong schema records #558

Merged
merged 1 commit into from
Jan 29, 2019

Conversation

n3nash
Copy link
Contributor

@n3nash n3nash commented Jan 22, 2019

In earlier versions of hoodie, the contract for a error record (a record with bad schema) was to throw an exception during getInsertValue() and then add this record to the failed records list, this prevents from failing the whole job for a single bad record.

In the latest release, this contract has changed. Due to the introduction of parallelizing read/write operations for Create/Merge handles, we are offloading the getInsertValue() to the reader thread to save time in the heavy operation and help in faster runtime. Since, getInsertValue() is called on the reader side, we throw an exception and fail the job even if a single row is with bad schema.

This PR fixes the code back to the original contract.

@n3nash n3nash force-pushed the fix_errorrecord_handling branch 2 times, most recently from 747df20 to 66895c4 Compare January 23, 2019 05:48
@n3nash
Copy link
Contributor Author

n3nash commented Jan 23, 2019

@bvaradar Can you please take a pass ?

Optional recordMetadata = record.getData().getMetadata();
try {
if (exception.isPresent()) {
throw exception.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of throwing and catching exceptions which has some overhead, can you copy the catch block here

Copy link
Contributor Author

@n3nash n3nash Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the number of such exceptions should be really low, ideally, this is the same piece of code that used to run before, throw and catch exceptions in error scenarios so would like to keep it the same ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think we should just do a exception.get() instanceof Throwable instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, done.

@@ -61,20 +59,30 @@ public CopyOnWriteLazyInsertIterable(Iterator<HoodieRecord<T>> sortedRecordItr,
this.hoodieTable = hoodieTable;
}

// Used for caching HoodieRecord along with insertValue. We need this to offload computation work to buffering thread.
static class BufferedPayload<T extends HoodieRecord> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you have a different name for this BufferedPayload class. We need to capture that this class encapsulates the result of converting HoodieRecord to Avro (e.g : HoodieAvroRecordGenResult)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this can be a standalone class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed the name, but like to keep the class here since it's only used for InsertHandlers ?

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest looks good

@n3nash
Copy link
Contributor Author

n3nash commented Jan 24, 2019

@bvaradar If it looks good, can you or @vinothchandar please merge it ?

@n3nash n3nash force-pushed the fix_errorrecord_handling branch from 66895c4 to d5a7c20 Compare January 24, 2019 05:49
@@ -61,20 +59,30 @@ public CopyOnWriteLazyInsertIterable(Iterator<HoodieRecord<T>> sortedRecordItr,
this.hoodieTable = hoodieTable;
}

// Used for caching HoodieRecord along with insertValue. We need this to offload computation work to buffering thread.
static class HoodieAvroRecordGenResult<T extends HoodieRecord> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to HoodieInsertValueGenResult ? meta question, same thing not needed for MergeHandle, since the buffering does not happen for update path?

Copy link
Contributor Author

@n3nash n3nash Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, this happens for merge handle as well, the name is misleading, renamed the commit.

Optional recordMetadata = record.getData().getMetadata();
try {
if (exception.isPresent()) {
throw exception.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think we should just do a exception.get() instanceof Throwable instead

@n3nash n3nash changed the title Fixing behavior of CreateHandle for invalid/wrong schema records Fixing behavior of buffering in Create/Merge handles for invalid/wrong schema records Jan 28, 2019
@n3nash n3nash force-pushed the fix_errorrecord_handling branch from d5a7c20 to 13c0d9c Compare January 28, 2019 15:06
@n3nash
Copy link
Contributor Author

n3nash commented Jan 28, 2019

@vinothchandar addressed your comments

@vinothchandar
Copy link
Member

build failing?

@n3nash n3nash force-pushed the fix_errorrecord_handling branch from 13c0d9c to ee1dc44 Compare January 28, 2019 18:51
@vinothchandar vinothchandar merged commit 7985eb7 into apache:master Jan 29, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants