Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support Adding Parquet Files to an Existing Table #225

Open
jacksonrnewhouse opened this issue Mar 1, 2024 · 3 comments
Open

Support Adding Parquet Files to an Existing Table #225

jacksonrnewhouse opened this issue Mar 1, 2024 · 3 comments

Comments

@jacksonrnewhouse
Copy link

Arroyo is a Rust-based stream processing engine that performs reliable computation on data across many supported sources and writes to a similar number of sinks. It has support for writing vanilla parquet to S3, as well as a Delta Lake integration. We'd like to also be able to write to Iceberg tables. Because of the consistency mechanisms of Arroyo, writes will be done separately from adding the files to the table, so we only need something like an "insert_table()" method on an existing table. It'd also be helpful to have some sort of "create table if not exist", but if that's more work we can tell users they have to make the table themselves.

@Xuanwo
Copy link
Member

Xuanwo commented Mar 1, 2024

Thank you for bringing this to our attention. This feature is indeed included in our process of writing data into Iceberg. We simply need to make the API accessible.

@liurenjie1024
Copy link
Contributor

Hi, @jacksonrnewhouse What's mention are two feature:

  1. Create table.
  2. Append files.

These two features are transaction apis. 1 is relative easy to finish, while 2 is a little complicated. For 2, do you need to insert also deletes, or just append data?

@jacksonrnewhouse
Copy link
Author

Just appending data would be sufficient.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants