-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Managed Iceberg] Support writing to partitioned tables #32102
[Managed Iceberg] Support writing to partitioned tables #32102
Conversation
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
assign set of reviewers |
Assigning reviewers. If you would like to opt out of this review, comment R: @m-trieu for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairly quick pass
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriterManager.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriterManager.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriterManager.java
Show resolved
Hide resolved
* <p>After closing, the resulting {@link ManifestFile}s can be retrieved using {@link | ||
* #getManifestFiles()}. | ||
*/ | ||
class RecordWriterManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the documentation, this might be accurately named PartitionedRecordWriter
or something. That communicates better than "Manager" which could mean almost anything. And it seems it could be Autocloseable
which would enable using it in try-with-resources blocks, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had it as PartitionedRecordWriter
but thought it may be misleading since the class is also used for unpartitioned writes
Added Autocloseable implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. A reason I don't like the "Manager" name is that I don't know what it is doing in the managing, and it really doesn't communicate that this is a thing that does the writing. We have so many things called "manager" and none of them have anything in common.
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/WriteGroupedRowsToFiles.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to get it merged for the release.
* support writing partitioned data * trigger integration tests * partitioned record writer to manage writers for different partitions * partitioned record writer * reject rows when we are saturated with record writers * refactor record writer manager * add tests * add more tests * make record writer manager transient * clean up test path * cleanup * cleanup * address comments * revert readability change * add to changes md
Fixes #31943
Adds support for writing to partitioned Iceberg tables.
A record writer manager is introduced to open and close writers as necessary. An Iceberg data writer instance is configured to write to only one partition, so multiple writers are needed to write to multiple partitions.
The behavior remains unchanged when writing to unpartitioned tables.
Also some small but key changes:
<warehouse>/<table>/<data-file>
<table>/metadata/