Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[HUDI-6099] Improved the performance of checking for valid commits when tagging record location. #8494

Merged
merged 1 commit into from
Apr 20, 2023

Conversation

prashantwason
Copy link
Member

[HUDI-6099] Improved the performance of checking for valid commits when tagging record location.

Change Logs

  1. Moved the checkIfValidCommit function to HoodieIndexUtils so that it can be shared across various indexes.
  2. checkIfValidCommit not accepts a HoodieTimeline instead of HoodieTableMetaClient. Hence, the timeline does not need to be computed for checking each and every record.
  3. Fixes HoodieInMemoryHashIndex which was not checking for valid commit when tagging location.

Impact

Improved performance when tagging a large number of records.

Risk level (write none, low medium or high below)

None.

Basically a code reorg.

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, nice catch~

@danny0405 danny0405 self-assigned this Apr 19, 2023
@danny0405 danny0405 added code-refactor priority:minor everything else; usability gaps; questions; feature reqs spark Issues related to spark index labels Apr 19, 2023
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 8afe549 into apache:master Apr 20, 2023
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 20, 2023
neverdizzy pushed a commit to neverdizzy/hudi that referenced this pull request Jun 15, 2023
…en tagging record location (apache#8494)

(cherry picked from commit 8afe549)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
code-refactor index priority:minor everything else; usability gaps; questions; feature reqs spark Issues related to spark
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants