Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fixing bug introducted in rollback for MOR table type with inserts into log files #417

Conversation

n3nash
Copy link
Contributor

@n3nash n3nash commented Jul 9, 2018

The bug was introduced here : 3da063f#diff-4c9f153416cce4b19fd73b4f1dcbb1d1R219

Essentially, the following situation takes place for MOR without inserts into log files :

  1. Write out all the file id's getting updated in the inflight file (saveWorkloadProfile information)

In case this was a failed commit :

  1. Read the affected file ids from the workload stored in the inflight file (only updated file ids are stored since file ids for inserts are unknown during workload profiling)
  2. First delete all the newly created parquet files based on the commit time.
  3. Iterate over all the file ids from the workload, filter out the updates only files and then append rollback blocks to revert updates.

In case of a successful commit being rolled back :

  1. Read the affected file ids from the workload stored in the commit file (note that now the file ids for the newly created parquet files and updated files both are present here)
  2. First delete all the newly created parquet files based on the commit time.
  3. Iterate over all the file ids from the workload, filter out the updates only and then append rollback blocks to revert updates.

The issue was there was no test case for the case when there are inserts + updates. The filter logic filtering out file ids from the workload for only updates was broken. The filter is now corrected.

@n3nash n3nash changed the title (WIP) Fixing bug introducted in rollback for MOR table type with inserts into log files Fixing bug introducted in rollback for MOR table type with inserts into log files Jul 11, 2018
@n3nash
Copy link
Contributor Author

n3nash commented Jul 11, 2018

@vinothchandar @bvaradar please review.

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, the fix looks good.

super.deleteCleanedFiles(filesToDeletedStatus, partitionPath, filter);
final Set<String> deletedFiles = filesToDeletedStatus.entrySet().stream()
.map(entry -> {
Path filePath = entry.getKey().getPath();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor : Looks like a reusable functionality of finding the fileId from a file-path (either log-file or parquet). Can you introduce a FSUtils method and move the logic there ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -428,14 +429,16 @@ public void testRollbackWithDeltaAndCompactionCommit() throws Exception {
dataFilesToRead.findAny().isPresent());

/**
* Write 2 (updates)
* Write 2 (inserts + updates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried applying the test-code patch alone to understand. I expected the test case to fail since I did not apply the actual fix but the test passed. Can you cross-check to see if some assertion needs to be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, let me check.

@n3nash n3nash force-pushed the bug_fix_for_rollback_inserts_into_log_files branch from b29a807 to 51da8d4 Compare July 17, 2018 19:16
@n3nash
Copy link
Contributor Author

n3nash commented Jul 17, 2018

@bvaradar Addressed your concerns, please let @vinothchandar know if it's ready to merge.

@n3nash n3nash force-pushed the bug_fix_for_rollback_inserts_into_log_files branch from 51da8d4 to ffb7077 Compare July 17, 2018 21:33
@vinothchandar vinothchandar merged commit 34ab54a into apache:master Jul 18, 2018
vinishjail97 pushed a commit to vinishjail97/hudi that referenced this pull request Dec 15, 2023
Co-authored-by: StreamingFlames <18889897088@163.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants