Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: init iceberg writer #275

Merged
merged 3 commits into from
Apr 22, 2024
Merged

feat: init iceberg writer #275

merged 3 commits into from
Apr 22, 2024

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Mar 16, 2024

This PR init the iceberg writer and implement the data file writer. It's the final part of #135.

}

/// The iceberg writer used to write data to iceberg table.
#[async_trait::async_trait]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sad message: we can't use async_fn_in_trait here because it does not allow object safety so we can't construct something like Box<dyn IcebergWriter>. It's important to erase the type for complex writer.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Mar 16, 2024

Copy link
Collaborator

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZENOTME for this pr, we are close to have writing support. Left some questions about the implementation.

type DefaultOutput = Vec<DataFileBuilder>;

/// The builder for iceberg writer.
#[allow(async_fn_in_trait)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I think builder also can use #[async_trait::async_trait]🤔

/// The associated writer type.
type R: IcebergWriter<I, O>;
/// Build the iceberg writer.
async fn build(self) -> Result<Self::R>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we follow the pattern in FileWriterBuilder to return impl Future?

self.inner_writer.write(&batch).await
}

async fn flush(&mut self) -> Result<Vec<DataFileBuilder>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning a builder, could we return DataFile by storing necessary fields in writer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it's reasonable to return DataFile here. The IcebergWriter return DataFile which represent that this DataFile is complete. It guarantee that it can used to commit directly but also means that it attach more optional message from other writer.

Copy link
Contributor Author

@ZENOTME ZENOTME Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that this will cause more complexity for the interface. The necessary fields for writer can't be store in the builder because a builder can used to build multiple writers so it means that the fields should be pass when we create the writer. So we need interface like following so that we can pass different config.

pub trait IcebergWriterBuilder<I = DefaultInput, O = DefaultOutput>:
    Send + Clone + 'static
{
    /// The associated writer type.
    type R: IcebergWriter<I, O>;
    type C;
    /// Build the iceberg writer.
    async fn build(self,config: Self::C) -> Result<Self::R>;
}

This interface can let us create the writer in a more flexible way but incurs more complexity. So I'm concerned is that worth.🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get your point. We don't need to expose configs, it's builder specific stuff. For DataFileWriterBuilder, we can create DataFileWriter with same partition value, and clone this partition value for every DataFileWriter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partition value may only be known when running. e.g. PartitionWriterBuilder<DataFileWriterBuilder>, the partition value is only knowing when partition writer accept the value. And we can't create the DataFileWriter with partition value in advance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.

arrow_schema::Field::new("item", arrow_schema::DataType::Int64, true)
.with_metadata(HashMap::from([(
PARQUET_FIELD_ID_META_KEY.to_string(),
"-1".to_string(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the field id is "-1"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the inner nest field, set the field id as -1. It coming from #176 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering why there are field ids -1? Took a look at the comment, but it only said to assign the field ids. Is -1 valid field id?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iceberg spec doesn't forbid -1 as field id, but it requires that all field ids should be unique, so assigning different fields to same field id would be illegal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As spec, we should assign a unique id for each field in a struct. It's wrong here. I have fixed the field id.

@liurenjie1024
Copy link
Collaborator

cc @Xuanwo @Fokko PTAL

@marvinlanhenke
Copy link
Contributor

@liurenjie1024 @ZENOTME
What's the current status on this PR - as it looks very promising as well as the outlined framework in #34 ?

Since we have already completed some issues (or they are in progress) for read support, I think it would be beneficial to outline the next steps for implementing write support. Perhaps in another tracking issue (I don't think we have none yet?).

I think we have most of the writers in place (when this PR is ready), but have yet to 'orchestrate' them?

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Apr 7, 2024

@liurenjie1024 @ZENOTME What's the current status on this PR - as it looks very promising as well as the outlined framework in #34 ?

I think this PR is ready to go.

Since we have already completed some issues (or they are in progress) for read support, I think it would be beneficial to outline the next steps for implementing write support. Perhaps in another tracking issue (I don't think we have none yet?).

I think we have most of the writers in place (when this PR is ready), but have yet to 'orchestrate' them?

Yes! Looks good to me. And after this basic framework, we can create a track for more specific writer. cc @liurenjie1024 @Xuanwo @Fokko

}

async fn flush(&mut self) -> Result<Vec<DataFile>> {
let writer = std::mem::replace(&mut self.inner_writer, self.builder.clone().build().await?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm bit confused about this. Why we need this? What will happen if user calling flush on the same file twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the FileWriter only provides the interface close(self) which will make it simple. To provide the interface flush(&mut self), each time flush we will create a new FileWriter.

A FileWriter will write the data into one or multiple files. There is no restriction for that.

What will happen if user calling flush on the same file twice?

It's safe. The semantic of flush is to flush data written before into the storage and generate the files.

Copy link
Member

@Xuanwo Xuanwo Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the error handling cases. If user hits error during flush, they can't retry it and will lost all existing written data. How about pushing the writer back if it flush failed? User can decide whether abort the whole writing process or retry this flush.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the error handling cases. If user hits error during flush, they can't retry it and will lost all existing written data. How about pushing the writer back if it flush failed? User can decide whether abort the whole writing process or retry this flush.

It's fine if you think it's out of the current scope. We can create an issue for this and keep moving forward.

Copy link
Contributor Author

@ZENOTME ZENOTME Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the error handling cases. If user hits error during flush, they can't retry it and will lost all existing written data. How about pushing the writer back if it flush failed? User can decide whether abort the whole writing process or retry this flush.

Thanks! I think you remind me that this way is not good for the error handle.
For a flush process, there are two phases:

  1. close inner writer.
  2. create a new writer.

Because of fn close(self), we have to replace a new writer before closing successfully. So there are two error cases that may happen:

  1. create a new writer fail so that we skip to close the original writer
  2. close fail but we create a new writer

I think there are two solutions:

  1. Use fn close(&mut self)
  2. Use Option<Writer> so that we can take the writer temporarily.

Copy link
Member

@Xuanwo Xuanwo Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer to use fn close(&mut self) since close also might return error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the error handling cases. If user hits error during flush, they can't retry it and will lost all existing written data. How about pushing the writer back if it flush failed? User can decide whether abort the whole writing process or retry this flush.

I find that we can't guarantee that the user can retry the flush. E.g. the close interface for parquet will consume itself. I think a more simple semantic we can guarantee that if flush fails, the data will lose all existing written data and the writer can be used again. It is not friendly for users but it's easier to maintain its semantics.
For this semantic, I think Option<Writer> and fn close(self) may be more appropriate because it guarantees that a writer will not be used again if the close fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more simple semantic we can guarantee that if flush fails, the data will lose all existing written data and the writer can be used again. It is not friendly for users but it's easier to maintain its semantics.

Makes sense. By the way, opendal can retry internally, so this should not be a big issue. Let's keep moving.

location_gen,
file_name_gen,
);
let mut data_file_writer = DataFileWriterBuilder::new(pb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little bit odd to pass a builder to another builder. Could we do something like this?

DataFileWriter
    .builder()
    // this would call ParquetWriter::builder() to get a `ParquetWriterBuilder`,
    // and then pass ownership of the `DataFileWriter` to the `ParquetWriterBuilder`,
    // returning the `ParquetWriterBuilder`
    .with_writer(ParquetWriter)  
    // these calls happen in the `ParquetWriterBuilder`, 
    // allowing customization of the wrapped concrete writer
    .with_foo()
    .with_bar()
    // this finalizes the `ParquetWriterBuilder`, building a
    // `ParquetWriter`, and returns the `DataFileWriterBuilder`
    // that was passed earlier, after first passing in the `ParrquetWriter`
    .build_writer()
    // these calls now happen on the `DataFileWriterBuilder`,
    // allowing further setup of the `DataFileWriter`
    .with_baz()
    .with_quux()
    // finally returns a `DataFileWriter`, or perhaps a
    // `Result<DataFileWriter>` or Future of one of those
    .build() 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This top-down way looks good to me to avoid passing builders to each other. However, I am not sure whether this style will incur more complexity in the future.

Actually, this API for now can do this thing because it doesn't restrict how to create and use the builder like the following. So I think this is more relative to the implementation style of our builder rather than the API design.

#[derive(Clone)]
struct B;

impl B {
    pub fn new() -> Self {
        Self
    }

    pub fn with_config(&mut self, _: ()) -> &mut Self {
        self
    }
}

#[async_trait::async_trait]
impl IcebergWriterBuilder for B {
    type R = BW;
    type C = ();
    async fn build(self, _: Self::C) -> Result<Self::R> {
        Ok(BW)
    }
}

struct BW;

impl BW {
    pub fn builder() -> B {
        B::new()
    }
}

#[async_trait::async_trait]
impl IcebergWriter for BW {
    async fn write(&mut self, _input: DefaultInput) -> Result<()> {
        Ok(())
    }

    async fn flush(&mut self) -> Result<DefaultOutput> {
        Ok(vec![])
    }
}

#[derive(Clone)]
struct A<I> {
    inner: Option<I>,
}

impl<I: IcebergWriterBuilder> A<I> {
    pub fn new() -> Self {
        Self { inner: None }
    }

    pub fn with_buidler(&mut self, builder: I) -> &mut I {
        self.inner = Some(builder);
        self.inner.as_mut().unwrap()
    }

    pub fn with_config(&mut self, _: ()) -> &mut Self {
        self
    }
}

struct AW;

impl AW {
    pub fn builder<I:IcebergWriterBuilder>() -> A<I> {
        A::<I>::new()
    }
}

#[async_trait::async_trait]
impl<I: IcebergWriterBuilder> IcebergWriterBuilder for A<I> {
    type R = AW;
    type C = ();
    async fn build(self, _: Self::C) -> Result<Self::R> {
        Ok(AW)
    }
}

#[async_trait::async_trait]
impl IcebergWriter for AW {
    async fn write(&mut self, _input: DefaultInput) -> Result<()> {
        Ok(())
    }

    async fn flush(&mut self) -> Result<DefaultOutput> {
        Ok(vec![])
    }
}

async fn test() {
    let mut a = AW::builder();
    a.
        // config first A
        with_config(()).
        with_buidler(AW::builder()).
        // config second A
        with_config(()).
        with_buidler(BW::builder()).
        // config BW
        with_config(());
    let writer = a.build(()).await;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both way looks good to me. cc @Fokko @Xuanwo @liurenjie1024

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered that we have discussed about this before...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this #135 (comment). Indeed they look similar.🥵

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move on first if both way looks good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

/// If close failed, the data written before maybe be lost. User may need to recreate the writer and rewrite the data again.
/// # NOTE
/// After close, no matter successfully or fail,the writer should never be used again, otherwise the writer will panic.
async fn close(&mut self) -> Result<O>;
Copy link
Contributor Author

@ZENOTME ZENOTME Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that flush is hard to handle the error case. To make the semantics easier to maintain. I use close(&mut) for IcebergWriter instead. The user should not use the writer again after the writer closes it, no matter successful or fail. This behavior should be guaranteed at compile time so I use panic for it rather than return an error.

Why not use close(self)? to guarantee this by compiler?
Because we need the writer can be used as a trait object, like Box<dyn IcebergWriter>. So we can't have a interface like close(self).

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Apr 22, 2024

I think this PR is ready to move. PTAL. I modified the interface a bit to make it easier for error handling. Feel free to let me there is still @Xuanwo @liurenjie1024 @Fokko @sdd

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I believe this PR is in good shape.

Copy link
Collaborator

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ZENOTME for this pr!

@liurenjie1024 liurenjie1024 merged commit aba6209 into apache:main Apr 22, 2024
8 checks passed
c-thiel pushed a commit to c-thiel/iceberg-rust that referenced this pull request May 13, 2024
* init iceberg writer

* refine

* refine the interface

---------

Co-authored-by: ZENOTME <st810918843@gmail.com>
github-merge-queue bot pushed a commit to risingwavelabs/iceberg-rust that referenced this pull request Sep 9, 2024
* feat: Add website layout (#130)

* feat: Add website layout

Signed-off-by: Xuanwo <github@xuanwo.io>

* publish to rust.i.a.o

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix license

Signed-off-by: Xuanwo <github@xuanwo.io>

* Let's try mdbook action

Signed-off-by: Xuanwo <github@xuanwo.io>

* use cargo install

Signed-off-by: Xuanwo <github@xuanwo.io>

* disable section

Signed-off-by: Xuanwo <github@xuanwo.io>

* Add docs for website

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix license

Signed-off-by: Xuanwo <github@xuanwo.io>

* action approved

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* feat: Expression system. (#132)

* feat: Expressions

* Fix comments

* Refactor expression to be more similar to iceberg model

* Fix typo

* website: Fix typo in book.toml (#136)

Signed-off-by: Xuanwo <github@xuanwo.io>

* Set ghp_path and ghp_branch properties (#138)

* chore: Upgrade toolchain to 1.75.0 (#140)

* feat: Add roadmap and features status in README.md (#134)

* feat: Add roadmap and features status in README.md

* Fix

* Fix

* Add more details according to comments

* Revert unnecessary new line break

* Nits

---------

Co-authored-by: Fokko Driesprong <fokko@apache.org>

* Infra: Remove `publish:` section from `.asf.yaml` (#141)

* chore(deps): Bump peaceiris/actions-gh-pages from 3.9.2 to 3.9.3 (#143)

Bumps [peaceiris/actions-gh-pages](https://github.com/peaceiris/actions-gh-pages) from 3.9.2 to 3.9.3.
- [Release notes](https://github.com/peaceiris/actions-gh-pages/releases)
- [Changelog](https://github.com/peaceiris/actions-gh-pages/blob/main/CHANGELOG.md)
- [Commits](https://github.com/peaceiris/actions-gh-pages/compare/v3.9.2...v3.9.3)

---
updated-dependencies:
- dependency-name: peaceiris/actions-gh-pages
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Update opendal requirement from 0.43 to 0.44 (#142)

Updates the requirements on [opendal](https://github.com/apache/incubator-opendal) to permit the latest version.
- [Release notes](https://github.com/apache/incubator-opendal/releases)
- [Changelog](https://github.com/apache/incubator-opendal/blob/main/CHANGELOG.md)
- [Commits](https://github.com/apache/incubator-opendal/compare/v0.43.0...v0.43.0)

---
updated-dependencies:
- dependency-name: opendal
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: Change homepage to rust.i.a.o (#146)

* feat: Introduce basic file scan planning. (#129)

* Code complete

* Resolve

* Done

* Fix comments

* Fix comments

* Fix comments

* Fix

* Fix comment

* chore: Update contributing guide. (#163)

* chore: Update reader api status (#162)

* chore: Update reader api status

* Restore unnecessary change

* #154 : Add homepage to Cargo.toml (#160)

* Add formatting for toml files (#167)

* Add formatting for toml files

* Update call to taplo

* Add command to format and a command to check

* chore(deps): Update env_logger requirement from 0.10.0 to 0.11.0 (#170)

Updates the requirements on [env_logger](https://github.com/rust-cli/env_logger) to permit the latest version.
- [Release notes](https://github.com/rust-cli/env_logger/releases)
- [Changelog](https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rust-cli/env_logger/compare/v0.10.0...v0.10.2)

---
updated-dependencies:
- dependency-name: env_logger
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: init file writer interface (#168)

* init file writer interface

* refine

---------

Co-authored-by: ZENOTME <st810918843@gmail.com>

* fix: Manifest parsing should consider schema evolution. (#171)

* fix: Manifest parsing should consider schema evolution.

* Fix ut

* docs: Add release guide for iceberg-rust (#147)

* fix: Ignore negative statistics value (#173)

* feat: Add user guide for website. (#178)

* Add

* Fix format

* Add license header

* chore(deps): Update derive_builder requirement from 0.12.0 to 0.13.0 (#175)

Updates the requirements on [derive_builder](https://github.com/colin-kiegel/rust-derive-builder) to permit the latest version.
- [Release notes](https://github.com/colin-kiegel/rust-derive-builder/releases)
- [Commits](https://github.com/colin-kiegel/rust-derive-builder/compare/v0.12.0...v0.12.0)

---
updated-dependencies:
- dependency-name: derive_builder
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Replace unwrap (#183)

* feat: add handwritten serialize (#185)

* add handwritten serialize

* revert expect

* remove expect

* Fix avro schema names for manifest and manifest_list (#182)

Co-authored-by: Fokko Driesprong <fokko@apache.org>

* feat: Bump hive_metastore to use pure rust thrift impl `volo` (#174)

* feat: Bump version 0.2.0 to prepare for release. (#181)

* feat: Bump version 0.2.0 to prepare for release.

* Update dependencies

* fix: `default_partition_spec` using the `partion_spec_id` set (#190)

* add unit tests

* fix type

* Docs: Add required Cargo version to install guide (#191)

* chore(deps): Update opendal requirement from 0.44 to 0.45 (#195)

Updates the requirements on [opendal](https://github.com/apache/opendal) to permit the latest version.
- [Release notes](https://github.com/apache/opendal/releases)
- [Changelog](https://github.com/apache/opendal/blob/main/CHANGELOG.md)
- [Commits](https://github.com/apache/opendal/compare/v0.44.0...v0.44.2)

---
updated-dependencies:
- dependency-name: opendal
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Smooth out release steps (#197)

Couple of small things:

- The license check failed because the `dist/*` files were there
- Add `dist/*` to gitignore since we don't want to push these files to the repo
- Make `scripts/release.sh` executable
- Align the svn structure with PyIceberg and Java

* refactor: remove support of manifest list format as a list of file path (#201)

* refactor: remove support of manifest list format as a list of file paths#158

* refactor: add field definition to manifest list

* refactor: delete duplicated function

* refactor: fix duplicate function name

* refactor: remove unwraps (#196)

* remove avro unwraps

* rm unwrap in schema manifest

* rm some expects

* rm types

* fix clippy

* fix string format

* refine some unwrap

* undo schema.rs

* Fix: add required rust version in cargo.toml (#193)

* Fix: add required rust version in cargo.toml

* added rust-version to workspace=true in package

* Fix the REST spec version (#198)

This number indicates from which release the code was generated.
For example, currently new endpoints are added to the spec, but
they are not supported by iceberg-rust yet.

* feat: Add Sync + Send to Catalog trait (#202)

* feat: Make thrift transport configurable (#194)

* feat: make transport configurable (#188)

* implement default for HmsThriftTransport

* Add UnboundSortOrder (#115)

* Add UnboundSortOrder

* Separate build methods for bound and unbound

* Use a constant for unsorted order_id

* ci: Add workflow for publish (#218)

* ci: Add workflow for publish

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix publish

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* ci: add workflow for cargo audit (#217)

* docs: Add basic README for all crates (#215)

* docs: Add basic README for all crates

Signed-off-by: Xuanwo <github@xuanwo.io>

* Remove license

Signed-off-by: Xuanwo <github@xuanwo.io>

* Update links

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* Follow naming convention from Iceberg's Java and Python implementations (#204)

* doc: Add download page (#219)

* doc: Add download page

* Fix links

* chore(deps): Update derive_builder requirement from 0.13.0 to 0.20.0 (#203)

Updates the requirements on [derive_builder](https://github.com/colin-kiegel/rust-derive-builder) to permit the latest version.
- [Release notes](https://github.com/colin-kiegel/rust-derive-builder/releases)
- [Commits](https://github.com/colin-kiegel/rust-derive-builder/compare/v0.13.0...v0.13.1)

---
updated-dependencies:
- dependency-name: derive_builder
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* test: add FileIO s3 test (#220)

* add file io s3 test

* add license

* fixed version & rm port scanner

* ci: Ignore RUSTSEC-2023-0071 for no actions to take (#222)

* ci: Ignore RUSTSEC-2023-0071 for no actions to take

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix license header

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* feat: Add expression builder and display. (#169)

* feat: Add expression builder and display.

* Fix comments

* Fix doc test

* Fix name of op

* Fix comments

* Fix timestamp

* chord:  Add IssueNavigationLink for RustRover (#230)

* chord:  IssueNavigationLink for RustRover

* move to .idea

* add apache license

---------

Co-authored-by: fuqijun <qijun.fqj@alibaba-inc.com>

* minor: Fix `double` API doc typo (#226)

* feat: add `UnboundPredicate::negate()` (#228)

Issue: #150

* fix: Remove deprecated methods to pass ci (#234)

* Implement basic Parquet data file reading capability (#207)

* feat: TableScan parquet file read to RecordBatch stream

* chore: add inline hinting and fix incorrect comment

* refactor: extract record batch reader

* refactor: rename `FileRecordBatchReader` to `ArrowReader`

* refactor: rename file_record_batch_reader.rs to arrow.rs
* refactor: move `batch_size` param to `TableScanBuilder`
* refactor: rename `TableScan.execute` to `to_arrow`

* refactor: use builder pattern to create `ArrowReader`

* chore: doc-test as a target (#235)

* feat: add parquet writer (#176)

* Add hive metastore catalog support (part 1/2) (#237)

* fmt members

* setup basic test-infra for hms-catalog

* add license

* add hms create_namespace

* add hms get_namespace

* fix: typo

* add hms namespace_exists and drop_namespace

* add hms update_namespace

* move fns into HmsCatalog

* use `expose` in docker-compose

* add hms list_tables

* fix: clippy

* fix: cargo sort

* fix: cargo workspace

* move fns into utils + add constants

* include database name in error msg

* add pilota to cargo workspace

* add minio version

* change visibility to pub(crate); return namespace from conversion fn

* add minio version in rest-catalog docker-compose

* fix: hms test docker infrastructure

* add version to minio/mc

* fix: license header

* fix: core-site

---------

Co-authored-by: mlanhenke <Marvin.Lanhenke@Berief-Food.de>

* chore: Enable projects. (#247)

* Make plan_files as asynchronous stream (#243)

* feat: Implement binding expression (#231)

* feat: Implement binding expression

* Implement Display instead of ToString (#256)

* add rewrite_not (#263)

* feat: init TableMetadataBuilder (#262)

* Rename stat_table to table_exists in Catalog trait (#257)

* feat (static table): implement a read-only table struct loaded from metadata (#259)

* fixing some broken branch

* adding readonly property to Table, and setting readonly value on StaticTable

* feat: implement OAuth for catalog rest client (#254)

* docs: annotate precision and length to primitive types (#270)

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* build: Restore CI by making parquet and arrow version consistent (#280)

* Metadata Serde + default partition_specs and sort_orders (#272)

* change serde metadata v2

* change default partition_specs and sort_orders

* change test

* use DEFAULTS

* feat: make optional oauth param configurable (#278)

* make optional oauth param configurable

* fix review comments.

---------

Co-authored-by: hpal <hpal@apple.com>

* fix: enable public access to ManifestEntry properties (#284)

* enable public access to ManifestEntry properties

* implementing getter methods instead of direct access

* feat: Implement the conversion from Arrow Schema to Iceberg Schema (#258)

* feat: Implement the conversion from ArrowSchema to iceberg Schema

* For review

* Update test

* Add LargeString, LargeBinary, LargeList and FixedSizeList

* Add decimal type

* For review

* Fix clippy

* Rename funtion name to add_manifests (#293)

* feat: modify `Bind` calls so that they don't consume `self` and instead return a new struct, leaving the original unmoved" (#290)

* Add hive metastore catalog support (part 2/2) (#285)

* fmt members

* setup basic test-infra for hms-catalog

* add license

* add hms create_namespace

* add hms get_namespace

* fix: typo

* add hms namespace_exists and drop_namespace

* add hms update_namespace

* move fns into HmsCatalog

* use `expose` in docker-compose

* add hms list_tables

* fix: clippy

* fix: cargo sort

* fix: cargo workspace

* move fns into utils + add constants

* include database name in error msg

* add pilota to cargo workspace

* add minio version

* change visibility to pub(crate); return namespace from conversion fn

* add minio version in rest-catalog docker-compose

* fix: hms test docker infrastructure

* add version to minio/mc

* fix: license header

* fix: core-site

* split utils and errors

* add fn get_default_table_location

* add fn get_metadata_location

* add docs

* add HiveSchemaBuilder

* add schema to HiveSchemaBuilder

* add convert_to_hive_table

* cargo sort

* implement table_ops without TableMetadataBuilder

* refactor: HiveSchema fn from_iceberg

* prepare table creation without metadata

* simplify HiveSchemaBuilder

* refactor: use ok_or_else()

* simplify HiveSchemaBuilder

* fix visibility of consts

* change serde metadata v2

* change default partition_specs and sort_orders

* change test

* add create table with metadata

* use FileIO::from_path

* add test_load_table

* small fixes + docs

* rename

* extract get_metadata_location from hive_table

* add integration tests

* fix: clippy

* remove whitespace

* fix: fixture names

* remove builder-prefix `with`

* capitalize error msg

* remove trait bound `Display`

* add const `OWNER`

* fix: default warehouse location

* add test-case `list_tables`

* add all primitives to test_schema

* exclude `Timestamptz` from hive conversion

* remove Self::T from schema

* remove context

* keep file_io in HmsCatalog

* use json schema repr

---------

Co-authored-by: mlanhenke <Marvin.Lanhenke@Berief-Food.de>

* feat: implement prune column for schema (#261)

* feat: implement PruneColumn for Schema

* fix: fix bugs for PruneColumn implementation

* test: add test cases for PruneColumn

* fix: fix minor to make more rusty

* fix: fix cargo clippy

* fix: construct expected_type from SchemaBuilder

* fix: more readability

* change return type of prune_column

* chore(deps): Update reqwest requirement from ^0.11 to ^0.12 (#296)

* Glue Catalog: Basic Setup + Test Infra (1/3) (#294)

* extend dependency DIRS

* create dependencies for glue

* basic setup

* rename test

* add utils/get_sdk_config

* add tests

* add list_namespace

* fix: clippy

* fix: unused

* fix: workspace

* fix: name

* use creds in test-setup

* fix: empty dependencies.rust.tsv

* fix: rename endpoint_url

* remove deps.tsv

* add hms deps.tsv

* fix deps.tsv

* fix: deps.tsv

* feat: rest client respect prefix prop (#297)

* feat: rest client respect prefix prop

Signed-off-by: TennyZhuang <zty0826@gmail.com>

* add test

Signed-off-by: TennyZhuang <zty0826@gmail.com>

* fix tests without prefix

Signed-off-by: TennyZhuang <zty0826@gmail.com>

* fix clippy

Signed-off-by: TennyZhuang <zty0826@gmail.com>

---------

Signed-off-by: TennyZhuang <zty0826@gmail.com>

* fix: missing properties (#303)

* fix: renaming FileScanTask.data_file to data_manifest_entry (#300)

* renaming FileScanTask.data_file to data_manifest_entry

* renaming data_file.content() to content_type()

* changing pub method to data()

* feat: Make OAuth token server configurable (#305)

* feat: Glue Catalog - namespace operations (2/3) (#304)

* add from_build_error

* impl create_namespace

* impl get_namespace

* add macro with_catalog_id

* impl namespace_exists

* impl update_namespace

* impl list_tables

* impl drop_namespace

* fix: clippy

* update docs

* update docs

* fix: naming and visibility of error conversions

* feat: add transform_literal (#287)

* add transform_literal

* refine

* fix unwrap

---------

Co-authored-by: ZENOTME <st810918843@gmail.com>

* feat: Complete predicate builders for all operators. (#276)

* feat: Complete predicate builders for all operators.

* ci: fix fmt error

* fix nan and notnan

* feat: Support customized header in Rest catalog client (#306)

Note that: the default headers will not be overwritten.

* fix: chrono dep (#274)

* feat: Read Parquet data file with projection (#245)

* feat: Read Parquet data file with projection

* fix

* Update

* More

* For review

* Use FeatureUnsupported error.

* Fix day timestamp micro (#312)

* basic fix

* change to Result<i32>

* use try_unary

* feat: support uri redirect in rest client (#310)

Signed-off-by: TennyZhuang <zty0826@gmail.com>

* refine: seperate parquet reader and arrow convert (#313)

* Upgrade to rust-version 1.77.1 (#316)

* Support identifier warehouses (#308)

* Support identifier warehouses

This is a bit confusing if you come from a Hive background
where the warehouse is always a path to hdfs/s3/etc.

With the REST catalog, the warehouse can also be a logical
identifier:
https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L72-L78

This means that we have to make sure that we only parse paths
that are an actual path, and not an identifier.

I'm open to suggestions. The check is now very simple, but can
be extended for example using a regex. But I'm not sure what
the implications are of importing additional packages (in Python
you want to keep it as lightweight as possible).

* Use `if Url::parse().is_ok()`

* feat: Project transform (#309)

* add project bucket_unary

* add project bucket_binary

* add project bucket_set

* add project identity

* add project truncate

* fixed array boundary

* add project void

* add project unknown

* add docs + none projections

* docs

* docs

* remove trait + impl boundary on Datum

* fix: clippy

* fix: test Transform::Unknown

* add: transform_literal_result

* add: transform_literal_result

* remove: whitespace

* move `boundary` to transform.rs

* add check if transform can be applied to data_type

* add check

* add: java-testsuite Transform::Bucket

* fix: clippy

* add: timestamps to boundary

* change: return bool from can_transform

* fix: clippy

* refactor: fn project match structure

* add: java-testsuite Transform::Truncate

* add: java-testsuite Transform::Dates + refactor

* fix: doc

* add: timestamp test + refactor

* refactor: simplify projected_boundary

* add: java-testsuite Transform::Timestamp

* refactor tests

* fix: timestamp conversion

* fix: temporal test_result

* basic fix

* change to Result<i32>

* use try_unary

* add: java-testsuite Transform::Timestamp Hours

* refactor: split and move tests

* refactor: move transform tests

* remove self

* refactor: structure fn project + helpers

* fix: clippy

* fix: typo

* fix: naming + generics

* feat: add Struct Accessors to BoundReferences (#317)

* feat: use str args rather than String in transform (#325)

* chore(deps): Update pilota requirement from 0.10.0 to 0.11.0 (#327)

Updates the requirements on [pilota](https://github.com/cloudwego/pilota) to permit the latest version.
- [Release notes](https://github.com/cloudwego/pilota/releases)
- [Commits](https://github.com/cloudwego/pilota/compare/pilota-0.10.0...pilota-0.10.0)

---
updated-dependencies:
- dependency-name: pilota
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump peaceiris/actions-mdbook from 1 to 2 (#332)

Bumps [peaceiris/actions-mdbook](https://github.com/peaceiris/actions-mdbook) from 1 to 2.
- [Release notes](https://github.com/peaceiris/actions-mdbook/releases)
- [Changelog](https://github.com/peaceiris/actions-mdbook/blob/main/CHANGELOG.md)
- [Commits](https://github.com/peaceiris/actions-mdbook/compare/v1...v2)

---
updated-dependencies:
- dependency-name: peaceiris/actions-mdbook
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump peaceiris/actions-gh-pages from 3.9.3 to 4.0.0 (#333)

Bumps [peaceiris/actions-gh-pages](https://github.com/peaceiris/actions-gh-pages) from 3.9.3 to 4.0.0.
- [Release notes](https://github.com/peaceiris/actions-gh-pages/releases)
- [Changelog](https://github.com/peaceiris/actions-gh-pages/blob/main/CHANGELOG.md)
- [Commits](https://github.com/peaceiris/actions-gh-pages/compare/v3.9.3...v4.0.0)

---
updated-dependencies:
- dependency-name: peaceiris/actions-gh-pages
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): Bump apache/skywalking-eyes from 0.5.0 to 0.6.0 (#328)

Bumps [apache/skywalking-eyes](https://github.com/apache/skywalking-eyes) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/apache/skywalking-eyes/releases)
- [Changelog](https://github.com/apache/skywalking-eyes/blob/main/CHANGES.md)
- [Commits](https://github.com/apache/skywalking-eyes/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: apache/skywalking-eyes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: add BoundPredicateVisitor. Add AlwaysTrue and AlwaysFalse to Predicate (#334)

* feat: add InclusiveProjection (#335)

* feat: Implement the conversion from Iceberg Schema to Arrow Schema (#277)

* support iceberg schema to arrow schema

* avoid realloc hashmap

---------

Co-authored-by: ZENOTME <st810918843@gmail.com>

* Simplify expression when doing `{and,or}` operations (#339)

This will make sure that we nicely reduce the expression in
the inclusive projection visitor:

https://github.com/apache/iceberg-rust/blob/de80a2436bb2fbbd5b4ec6bcafd0bd041b263595/crates/iceberg/src/expr/visitors/inclusive_projection.rs#L73

* feat: Glue Catalog - table operations (3/3) (#314)

* add GlueSchemaBuilder

* add warehouse

* add serde_json, tokio, uuid

* add minio

* add create_table

* add tests utils

* add load_table

* add drop_table + table_exists

* add rename_table

* add docs

* fix: docs + err_msg

* fix: remove unused const

* fix: default_table_location

* fix: remove single quotes error message

* chore: add test-condition `test_rename_table`

* chore: add test-condition `test_table_exists`

* chore: update roadmap (#336)

* chore: update roadmap

* chore: update reader section

* fix: read into arrow record batch

* feat: add ManifestEvaluator (#322)

* feat: init iceberg writer (#275)

* init iceberg writer

* refine

* refine the interface

---------

Co-authored-by: ZENOTME <st810918843@gmail.com>

* feat: implement manifest filtering in TableScan (#323)

* Refactor: Extract `partition_filters` from `ManifestEvaluator` (#360)

* refactor: extract inclusive_projection from manifest_evaluator

* refactor: add FileScanStreamContext

* refactor: create partition_spec and partition_schema

* refactor: add cache structs

* refactor: use entry in partition_file_cache

* refactor: use result

* chore: update docs + fmt

* refactor: add bound_filter to FileScanStreamContext

* refactor: return ref BoundPredicate

* fix: return type PartitionSpecRef

* refactor: remove spec_id runtime check

* feat: add check for content_type data

* Basic Integration with Datafusion (#324)

* chore: basic structure

* feat: add IcebergCatalogProvider

* feat: add IcebergSchemaProvider

* feat: add IcebergTableProvider

* chore: add integration test infr

* fix: remove old test

* chore: update crate structure

* fix: remove workspace dep

* refactor: use try_join_all

* chore: remove feature flag

* chore: rename package

* chore: update readme

* feat: add TableType

* fix: import + async_trait

* fix: imports + async_trait

* chore: remove feature flag

* fix: cargo sort

* refactor: CatalogProvider `fn try_new`

* refactor: SchemaProvider `fn try_new`

* chore: update docs

* chore: update docs

* chore: update doc

* feat: impl `fn schema` on TableProvider

* chore: rename ArrowSchema

* refactor: remove DashMap

* feat: add basic IcebergTableScan

* chore: fix docs

* chore: add comments

* fix: clippy

* fix: typo

* fix: license

* chore: update docs

* chore: move derive stmt

* fix: collect into hashmap

* chore: use DFResult

* Update crates/integrations/datafusion/README.md

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* refactor: cache partition_schema in `fn plan_files()` (#362)

* refactor: add partition_schema_cache

* refactor: use context as param object

* fix: test setup

* refactor: clone only when cache miss

* chore: move derive stmts

* refactor: remove unused case_sensitive parameter

* refactor: remove partition_schema_cache

* refactor: move partition_filter into wider scope

* fix (manifest-list): added serde aliases to support both forms conventions (#365)

* added serde aliases to support both forms conventions

* reading manifests without avro schema

* adding avro files of both versions and add a test to deser both

* fixed typo

* feat: Extract FileRead and FileWrite trait (#364)

* feat: Extract FileRead and FileWrie trait

Signed-off-by: Xuanwo <github@xuanwo.io>

* Enable s3 services for tests

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix sort

Signed-off-by: Xuanwo <github@xuanwo.io>

* Add comment for io trait

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix test for rest

Signed-off-by: Xuanwo <github@xuanwo.io>

* Use try join

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* feat: Convert predicate to arrow filter and push down to parquet reader (#295)

* feat: Convert predicate to arrow filter and push down to parquet reader

* For review

* Fix clippy

* Change from vector of BoundPredicate to BoundPredicate

* Add test for CollectFieldIdVisitor

* Return projection_mask for leaf column

* Update

* For review

* For review

* For review

* For review

* More

* fix

* Fix clippy

* More

* Fix clippy

* fix clippy

* chore(deps): Update datafusion requirement from 37.0.0 to 38.0.0 (#369)

* chore(deps): Update itertools requirement from 0.12 to 0.13 (#376)

* Add `InclusiveMetricsEvaluator` (#347)

* feat: add InclusiveMetricsEvaluator

* test: add more tests for InclusiveMetricsEvaluator

* Rename V2 spec names. (#380)

* make file scan task serializable (#377)

Co-authored-by: ZENOTME <st810918843@gmail.com>

* Feature: Schema into_builder method (#381)

* replaced `i32` in `TableUpdate::SetDefaultSortOrder` to `i64` (#387)

* fix: make PrimitiveLiteral and Literal not be Ord (#386)

* make PrimitiveLiteral and Literal not be Ord

* refine Map

* fix name

* fix map test

* refine

---------

Co-authored-by: ZENOTME <st810918843@gmail.com>

* docs(writer/docker): fix small typos and wording (#389)

* docs: fixup docker compose test_utils

* docs: iceberg writer close fn

* feat: `StructAccessor.get` returns `Result<Option<Datum>>` instead of `Result<Datum>` (#390)

This is so that the accessor's result can represent null field values.

Fixes: #379

* feat: add `ExpressionEvaluator` (#363)

* refactor: add partition_schema_cache

* refactor: use context as param object

* fix: test setup

* refactor: clone only when cache miss

* chore: move derive stmts

* feat: add basic setup expression evaluator

* refactor: remove unused case_sensitive parameter

* chore: add doc

* refactor: remove partition_schema_cache

* refactor: move partition_filter into wider scope

* feat: add expression_evaluator_cache and apply in scan.rs

* chore: remove comment

* refactor: remove unused test setup fn

* feat: add basic test infr + simple predicate evaluation

* fix: clippy

* feat: impl `is_null` + `not_null`

* feat: impl `is_nan` + `not_nan`

* chore: change result type

* feat: impl `less_than` + `greater_than`

* chore: fix return type

* feat: impl `eq` + `not_eq`

* feat: impl `starts_with + `not_starts_with`

* feat: impl  +

* chore: add tests for and and or expr

* chore: move test

* chore: remove unused_vars

* chore: update docs

* chore: update docs

* fix: typo

* refactor: compare datum instead of primitive literal

* refactor: use Result<Option> from accessor

* chore: remove unused fn

* fix: sdd sleep pattern matching

* Derive Clone for TableUpdate (#402)

* Add accessor for Schema identifier_field_ids (#388)

* Add accessor for Schema identifier_field_ids

* dont expose HashSet

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

* fix

* Fix accessor

---------

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

* deps: Bump arrow related crates to 52 (#403)

* SnapshotRetention::Tag max_ref_age_ms should be optional (#391)

* feat: Add storage features for iceberg (#400)

* feat: Add storage features for iceberg

Signed-off-by: Xuanwo <github@xuanwo.io>

* Format toml

Signed-off-by: Xuanwo <github@xuanwo.io>

* Add fs and s3 into default features

Signed-off-by: Xuanwo <github@xuanwo.io>

* Make toml happy

Signed-off-by: Xuanwo <github@xuanwo.io>

* Remove not needed feature flag

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* Implement `BoundPredicateVisitor` for `ManifestFilterVisitor` (#367)

* Implement all functions of BoundPredicateVisitor for ManifestFilterVisitor

* Fix code comments

* Refactor code and fixpredicate for is_some_and

* Refactor code

* Handle review comments

* Handle review comments

* Handle review comments

* Refactor code

* Add missing arrow predicate pushdown implementations for `StartsWith`, `NotStartsWith`, `In`, and `NotIn` (#404)

* feat: add [not_]starts_with and [not_]in arrow predicate pushdown

* fixes from issues highlighted in review

* feat: make BoundPredicate,Datum serializable (#406)

* make BoundPredicate,Datum serializable

* refine error

* fix float check

* use value instead of string to avoid precision loss

---------

Co-authored-by: ZENOTME <st810918843@gmail.com>

* refactor: Upgrade hive_metastore to 0.1 (#409)

* refactor: Upgrade hive_metastore to 0.1

Signed-off-by: Xuanwo <github@xuanwo.io>

* format toml

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix typo

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* Remove duplicate filter (#414)

* Enhancement: refine the reader interface (#401)

* refactor(catalog/rest): Split http client logic to seperate mod (#423)

Signed-off-by: Xuanwo <github@xuanwo.io>

* Remove #[allow(dead_code)] from the codebase (#421)

* Remove #[allow(dead_code)] from the codebase

* Remove: dead_code, move: avroschema fn to test

* Fix checks and code style, remove unused code

* Change function name

* ci: use official typos github action (#426)

* feat: support lower_bound&&upper_bound for parquet writer (#383)

* refactor: Implement ArrowAsyncFileWriter directly to remove tokio (#427)

* refactor: Implement ArrowAsyncFileWriter directly to remove tokio

Signed-off-by: Xuanwo <github@xuanwo.io>

* Make build pass

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* chore: Don't enable reqwest default features (#432)

* refactor(catalogs/rest): Split user config and runtime config (#431)

* refactor(catalogs/rest): Split user config and runtime config

Signed-off-by: Xuanwo <github@xuanwo.io>

* Sort cargo

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix unit tests

Signed-off-by: Xuanwo <github@xuanwo.io>

* Remove default feature of tokio

Signed-off-by: Xuanwo <github@xuanwo.io>

* return error here

Signed-off-by: Xuanwo <github@xuanwo.io>

* Return error if cred doesn't exist

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* feat: runtime module (#233)

* temp runtime

* POC

* fix chrono

* fix dep

* refine module

* refactor to use a deadly simple way

* allow dead_code

* add license

* fix clippy and tests

* clean code

* undo

* add async-std ci test

* rm tokio dev-dep

* make tokio dev dep

* fix sort

* rm tokio dev

* fix: Fix namespace identifier in url (#435)

* fix: Fix namespace identifier in url

* Remove table encoding

* refactor(io): Split io into smaller mods (#438)

* refactor(io): Split io into smaller mods

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix test

Signed-off-by: Xuanwo <github@xuanwo.io>

* Format

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix cap

Signed-off-by: Xuanwo <github@xuanwo.io>

* Remove not used deps

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* chore: Use once_cell to replace lazy_static (#443)

* chore: Use once_cell to replace lazy_static

Signed-off-by: Xuanwo <github@xuanwo.io>

* Format toml

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* fix: Fix build while no-default-features enabled (#442)

* fix: Fix build while no-default-features enabled

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix clippy

Signed-off-by: Xuanwo <github@xuanwo.io>

* Add ci for no default features

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* chore(deps): Bump crate-ci/typos from 1.22.9 to 1.23.1 (#447)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.22.9 to 1.23.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](https://github.com/crate-ci/typos/compare/v1.22.9...v1.23.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: Refactor the README to be more user-oriented (#444)

* docs: Refactor the README to be more user-oriented

Signed-off-by: Xuanwo <github@xuanwo.io>

* Apply suggestions from code review

Co-authored-by: Fokko Driesprong <fokko@apache.org>

* Polish

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>
Co-authored-by: Fokko Driesprong <fokko@apache.org>

* feat: Add cargo machete (#448)

* adding cargo machete to check unused dependencies

* remove default values

* adding a tag version instead of main

* running cargo machete natively

* removing unused dependency urlencoding

* bug fixes

* chore: Use nightly toolchain for check (#445)

* chore: Use nightly toolchain for check

* Fix check

* Fix clippy finds

* Make rustfmt happy

* Make rustfmt happy

* Update github actions

* Use action builder since apache doesn't allow external actions

* Fix comments

* Fix README.md

* reuse docker container to save compute resources (#428)

* reuse docker container to save compute resources

* add lazy resuse docker compose

* refactor test fixture: the docker compose init is reused

* use ctor and dtor to start docker compose and destory docker compose

* fix cargo fmt check

* fix cargo clippy

* fix cargo fmt

* fix cargo sort

* add namespace for datafusion test

* add empty check for list glue catalog namespace

---------

Co-authored-by: thexiay <xiayu1187@gmail.com>

* feat: Add macos runner for ci (#441)

* feat: Add macos runner for ci

* feat: Add publish for macOS

* reset the publish.yml

* feat: add macOS for check ci

* remove the macOS for unit ci

* chore: remove compose obsolete version (#452) (#454)

reference: https://docs.docker.com/compose/compose-file/04-version-and-name/#version-top-level-element-obsolete

* Refactor file_io_s3_test.rs (#455)

* chore(deps): Bump crate-ci/typos from 1.23.1 to 1.23.2 (#457)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.23.1 to 1.23.2.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](https://github.com/crate-ci/typos/compare/v1.23.1...v1.23.2)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* remove binary serialize in literal (#456)

Co-authored-by: ZENOTME <st810918843@gmail.com>

* fix: Hms test on macos should use correct arch (#461)

* Fix ManifestFile length calculation (#466)

* chore(deps): Update typed-builder requirement from ^0.18 to ^0.19 (#473)

---
updated-dependencies:
- dependency-name: typed-builder
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: use avro fixed to represent decimal (#472)

fix #144

Signed-off-by: xxchan <xxchan22f@gmail.com>

* chore(catalog): Deprecate rest.authorization-url in favor of oauth2-server-uri (#480)

* fix: Transform::Day maps to Date rather than Int for consistency with reference implementation (#479)

Issue: https://github.com/apache/iceberg-rust/issues/478

* feat(iceberg): Add memory file IO support (#481)

* feat(iceberg): Add memory file IO support

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix typo

Signed-off-by: Xuanwo <github@xuanwo.io>

* Add comments for memory file io

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* Add in-memory catalog implementation (#475)

* feat: Add in-memory catalog

* Make clippy happy

* Make cargo sort happy

* Fix README links

* Configurable file_io

* Avoid nightly features

* Remove TempFile

* Use futures::lock::Mutex instead

* Minor clean up

* Make root configurable in FS FileIO and remove default_table_root_location from Catalog

* Revert "Make root configurable in FS FileIO and remove default_table_root_location from Catalog"

This reverts commit 807dd4cf649b5c367f25afc59f99341d6995c337.

* Remove default_table_root_location from Catalog and explicitly configure a location for tables in unit tests

* lowercase catalog

* Use default instead of new

* Change references to memory

* chore: Enable new rust code format settings (#483)

* chore: Enable new format settings

Signed-off-by: Xuanwo <github@xuanwo.io>

* Format

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* docs: Generate rust API docs (#486)

Signed-off-by: Xuanwo <github@xuanwo.io>

* chore: Fix format of recent PRs (#487)

Signed-off-by: Xuanwo <github@xuanwo.io>

* Rename folder to memory (#490)

* chore(deps): Bump crate-ci/typos from 1.23.2 to 1.23.5 (#493)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.23.2 to 1.23.5.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](https://github.com/crate-ci/typos/compare/v1.23.2...v1.23.5)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* View Spec implementation (#331)

* Add support for ViewSpec

* Fix typos

* Fix typos

* clippy is always right

* Add tests

* Remove new_view_version test function

* Remove append_version

* View Representations Struct

* ViewRepresentation case insensitive

* Add fallible methods for ViewRepresentationsBuilder

* Add tests for fallibe ViewRepresentationsBuilder methods

* Introduce ViewVersionId as i32

* Iterator for &'a ViewRepresentations

* Improve comments

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

* Add test_view_metadata_v1_file_valid

* Fix view_version iter

* Remove ViewRepresentationsBuilder

* Fix comment

* Timestamp error handling

* Fallible Timestamp Conversion from Millis

* Fix Initial view Version = 1

* Cleanup

* Hide ViewMetadata iter() type

* timestamp_ms_to_utc -> error.rs

* TableMetadata timestamp conversion -> utility function

* Improve error context

* timestamp_ms_to_utc: LocalResult::None -> DataInvalid

* Fix obsolete comment

* ViewRepresentation::SqlViewRepresentation -> ::Sql

* Fix broken clippy from rebase

---------

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

* fix: Return error on reader task (#498)

* chore: Bump OpenDAL to 0.48 (#500)

* chore: Bump OpenDAL to 0.48

Signed-off-by: Xuanwo <github@xuanwo.io>

* Format toml

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* add check compatible func for primitive type (#492)

Co-authored-by: ZENOTME <st810918843@gmail.com>

* refactor(iceberg): Remove an extra config parse logic (#499)

* refactor(iceberg): Remove an extra config parse logic

Signed-off-by: Xuanwo <github@xuanwo.io>

* Format toml

Signed-off-by: Xuanwo <github@xuanwo.io>

* reduce some allocs

Signed-off-by: Xuanwo <github@xuanwo.io>

* Cleanup more

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* feat: permit Datum Date<->Int type conversion (#496)

Required for correct handling of partitions with Transform::Date

* Add additional S3 FileIO Attributes (#505)

* Add additional S3 FileIO Attributes

* Remove custom S3SSEType

* docs: Add links to dev docs (#508)

* docs: Add links to dev docs

Signed-off-by: Xuanwo <github@xuanwo.io>

* Add links

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* chore: Remove typo in README (#509)

* chore: Remove typo in README

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix link

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* feat: podman support (#489)

* feat: improve docker/podman get OsArch compatibility

* fix: replace deprecated 'links' keyword in docker compose

* refactor: use IpAddr and SocketAddr when able

* docs: add podman documentation

* address PR documentation comments

* address pr comments on tests

* Address pr comments, properly handle result with match

* address pr comments, refactor get_cmd_output

* move podman instr to docs/contributing

* address pr comment, show detailed err msg

---------

Co-authored-by: Alex Yin <alexyin@ibm.com>

* feat(table): Add debug and clone trait to static table struct (#510)

* Use namespace location or warehouse location if table location is missing (#511)

* chore(deps): Bump crate-ci/typos from 1.23.5 to 1.23.6 (#521)

* Concurrent table scans (#373)

* feat: concurrent table scans

* refactor: remove TableScanConfig.

* refactor: replace num_cpus with thread::available_parallelism (#526)

* Fix: MappedLocalTime should not be exposed (#529)

* feat: Establish subproject pyiceberg_core (#518)

Signed-off-by: Xuanwo <github@xuanwo.io>

* fix: complete miss attribute for map && list in avro schema (#411)

* add miss attr in list/map avro schema

* refine error handle

* fix unused warn

* fix typos

* update avro and unittest

* refine check_schema_conversion

---------

Co-authored-by: ZENOTME <st810918843@gmail.com>

* arrow/schema.rs: refactor tests (#531)

* arrow/schema.rs: refactor tests

Signed-off-by: Shirly <AndreMouche@126.com>

* *:address comments

Signed-off-by: Shirly <AndreMouche@126.com>

---------

Signed-off-by: Shirly <AndreMouche@126.com>

* feat: initialise SQL Catalog (#524)

* feat: initialise SQL Catalog

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* fix: remove rls-rustls

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* feat: change to SqlBindStyle and rename consts

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

---------

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* chore(deps): Bump actions/setup-python from 4 to 5 (#536)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* support session token (#530)

* Simplify PrimitiveLiteral (#502)

* simplify PrimitiveLiteral

* fix test

---------

Co-authored-by: ZENOTME <st810918843@gmail.com>

* chore: bump opendal to 0.49 (#540)

* feat: support timestamp columns in row filters (#533)

Fixes: https://github.com/apache/iceberg-rust/issues/532

* fix: don't silently drop errors encountered in table scan file planning (#535)

* chore(deps): Update sqlx requirement from 0.7.4 to 0.8.0 (#537)

Updates the requirements on [sqlx](https://github.com/launchbadge/sqlx) to permit the latest version.
- [Changelog](https://github.com/launchbadge/sqlx/blob/main/CHANGELOG.md)
- [Commits](https://github.com/launchbadge/sqlx/compare/v0.7.4...v0.8.0)

---
updated-dependencies:
- dependency-name: sqlx
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix main branch building break (#541)

* feat: support for gcs storage (#520)

* chore: include opendal/services-gcs

* feat: basic gcs scaffolding

* feat: populate config parse with basic details

* feat: include docker-compose integration tests

* feat: add extra iceberg properties

* feat: add tests for gcs read/write

These are currently conditional tests with a todo comment using the
test_with proc macro. More work needs to be done on
investigating/potentially expanding OpenDAL to allow unauthenticated
requests to fake-gcs-server. At the moment this always ends up reaching
the final VM metadata check.

* chore: minor cleanup for compose todo

* fix: do not introduce new properties

* feat: infer bucket from path

* chore: add user-project const

* feat: add allow_anonymous for test

* chore: remove test-with dep

* feat: update with allow_anonymous functionality

This requires the opendal allow_anonymous funcitonality with the GCS
service to work.

* ci: use cargo sort

* chore: undo storage-gcs default feature

* feat: include disable_ params for GCS_NO_AUTH

* ci: use storage-all for async-std tests

* revert: use opendal from workspace

Now that v0.49 has been released, this work does not need to pin to a
particular version!

* feat: Allow FileIO to reuse http client (#544)

Signed-off-by: Xuanwo <github@xuanwo.io>

* docs: Add an example to scan an iceberg table (#545)

* docs: Add an example to scan an iceberg table

Signed-off-by: Xuanwo <github@xuanwo.io>

* Format toml

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* Concurrent data file fetching and parallel RecordBatch processing (#515)

* feat: concurrent data file fetches, parallel RecordBatch processing

* refactor: centralize infallible `available_parallelism` fn. Use better channel size limit in arrow read

* doc: Add statement for contributors to avoid force push as much as possible (#546)

* chore: Bump datafusion to 41 (#548)

Signed-off-by: Xuanwo <github@xuanwo.io>

* feat: Partition Binding and safe PartitionSpecBuilder (#491)

* Initial commit

* Fixes

* Replace UnboundPartitionSpec Builder

* Fix tests, allow year, month day partition

* Comments

* typos

* Fix UnboundBuild setting partition_id

* Add test for unbound spec without partition ids

* Fix into_unbound fn name

* Split bound & unbound Partition builder, change add_partition_fields

* Improve comment

* Fix fmt

* Review fixes

* Remove partition_names() HashSet creation

* Bump to version 0.3.0 (#549)

* Bump to version 0.3.0

Signed-off-by: Xuanwo <github@xuanwo.io>

* regen

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix typo

Signed-off-by: Xuanwo <github@xuanwo.io>

* Update CHANGELOG.md

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* io: add support for role arn and external id s3 props (#553)

Add support for client.assume-role.arn and
client.assume-role.external-id s3 config properties.

Partial fix for #527

* fix: ensure S3 and GCS integ tests are conditionally compiled only when the storage-s3 and storage-gcs features are enabled (#552)

* docs: fix main iceberg example (#554)

* io: add support to set assume role session name (#555)

Partial fix for #527

* test: refactor datafusion test with memory catalog (#557)

* add memory catalog

* fix style

* fix style

* add clean job in Makefile (#561)

* docs: Fix build website permission changed (#564)

* Object Cache: caches parsed Manifests and ManifestLists for performance (#512)

* feat: adds ObjectCache, to cache Manifests and ManifestLists

* refactor: change obj cache method names and use more readable default usize value

* chore: improve error message

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

* fix: change object cache retrieval method visibility

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

* feat: improved error message in object cache get_manifest

* test(object-cache): add unit tests for object cache manifest and manifest list retrieval

* fix: ensure that object cache insertions are weighted by size

* test: fix test typo

* fix: ensure object cache weight is that of the wrapped item, not the Arc

---------

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

* Update the paths (#569)

This is in line with the paths above, and also with the previous version:

https://dist.apache.org/repos/dist/release/iceberg/

* docs: Add links for released crates (#570)

Signed-off-by: Xuanwo <github@xuanwo.io>

* Python: Use hatch for dependency management (#572)

* Ensure that RestCatalog passes user config to FileIO (#476)

* fix: ensure that RestCatalog passes user config to FileIO

* docs: added some doc comments to clarify override order for config

* Move `zlib` and `unicode` licenses to `allow` (#566)

Both licenses can be moved to the `allowed` section:

- **adler32** [ships](https://github.com/remram44/adler32-rs/blob/master/LICENSE) with a **zlib** license and is a category A-license
- **unicode-ident** ships with a **UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE** which is also a category A-license

The **ring** license is a bit [more involved](https://github.com/briansmith/ring/blob/main/LICENSE) and carries a lot of history, I think it is best to keep that as an exception for now, since the OpenSSL license is also not explicitly listed on the ASF page. I don't see anything alarming in the `LICENSE` file.

ASF page on the subject: https://www.apache.org/legal/resolved.html#category-a

* website: Update links for 0.3.0 (#573)

Signed-off-by: Xuanwo <github@xuanwo.io>

* feat(timestamp_ns): Implement timestamps with nanosecond precision (#542)

* feat(timestamp_ns): first commit

* feat(timestamp_ns): Add mappings for timestamp_ns/timestamptz_ns

* feat(timestamp_ns): Remove unused dep

* feat(timestamp_ns): Fix unit test

* feat(timestamp_ns): Fix test_all_type_for_write()

* feat(timestamp_ns): fix test_transform_days_literal

* feat(timestamp_ns): fix math for timestamptz_nanos

* chore: formatting

* chore: formatting

* chore: Appease clippy

---------

Co-authored-by: Timothy Maloney <tmaloney@influxdata.com>

* fix: correct partition-id to field-id in UnboundPartitionField (#576)

* correct partition-id to field id in PartitionSpec

* correct partition-id to field id in PartitionSpec

* correct partition-id to field id in PartitionSpec

* xx

* fix: Update sqlx from 0.8.0 to 0.8.1 (#584)

* chore(deps): Update typed-builder requirement from 0.19 to 0.20 (#582)

---
updated-dependencies:
- dependency-name: typed-builder
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Expose Transforms to Python Binding (#556)

* bucket transform rust binding

* format

* poetry x maturin

* ignore poetry.lock in license check

* update bindings_python_ci to use makefile

* newline

* https://github.com/python-poetry/poetry/pull/9135

* use hatch instead of poetry

* refactor

* revert licenserc change

* adopt review feedback

* comments

* unused dependency

* adopt review comment

* newline

* I like this approach a lot better

* more tests

* chore(deps): Bump crate-ci/typos from 1.23.6 to 1.24.1 (#583)

* Table Scan: Add Row Group Skipping (#558)

* feat(scan): add row group and page index row selection filtering

* fix(row selection): off-by-one error

* feat: remove row selection to defer to a second PR

* feat: better min/max val conversion in RowGroupMetricsEvaluator

* test(row_group_filtering): first three tests

* test(row_group_filtering): next few tests

* test: add more tests for RowGroupMetricsEvaluator

* chore: refactor test assertions to silence clippy lints

* refactor: consolidate parquet stat min/max parsing in one place

* chore: bump crate-ci/typos to 1.24.3 (#598)

* feat: SQL Catalog - namespaces (#534)

* feat: SQL Catalog - namespaces

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* feat: use transaction for updates and creates

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* fix: pull out query param builder to fn

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* feat: add drop and tests

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* fix: String to str, remove pub and optimise query builder

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* fix: nested match, remove ok()

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* fix: remove pub, add set, add comments

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* fix: refactor list_namespaces slightly

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* fix: add default properties to all new namespaces

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* fix: remove check for nested namespace

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* chore: add more comments to the CatalogConfig to explain bind styles

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* fix: edit test for nested namespaces

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

---------

Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>

* feat: Add more fields in FileScanTask (#609)

Signed-off-by: Xuanwo <github@xuanwo.io>

* chore(deps): Bump crate-ci/typos from 1.24.3 to 1.24.5 (#616)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.24.3 to 1.24.5.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](https://github.com/crate-ci/typos/compare/v1.24.3...v1.24.5)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: Less Panics for Snapshot timestamps (#614)

* feat: partition compatibility (#612)

* Partition compatability

* Partition compatability

* Rename compatible_with -> is_compatible_with

* feat: SortOrder methods should take schema ref if possible (#613)

* SortOrder methods should take schema ref if possible

* Fix test type

* with_order_id should not take reference

* feat: add `client.region` (#623)

* fix: Correctly calculate highest_field_id in schema (#590)

---------

Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: xxchan <xxchan22f@gmail.com>
Signed-off-by: Shirly <AndreMouche@126.com>
Signed-off-by: callum-ryan <19956159+callum-ryan@users.noreply.github.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: hiirrxnn <142747922+hiirrxnn@users.noreply.github.com>
Co-authored-by: Tyler Schauer <tylerschauer@gmail.com>
Co-authored-by: ZENOTME <43447882+ZENOTME@users.noreply.github.com>
Co-authored-by: ZENOTME <st810918843@gmail.com>
Co-authored-by: Chengxu Bian <piensengxv@gmail.com>
Co-authored-by: JanKaul <jankaul@mailbox.org>
Co-authored-by: Manu Zhang <OwenZhang1990@gmail.com>
Co-authored-by: Yue Deng <59086724+Dysprosium0626@users.noreply.github.com>
Co-authored-by: Dinesh Phuyel <86118075+dp-0@users.noreply.github.com>
Co-authored-by: ZHENGLIN LI <63448884+ZhengLin-Li@users.noreply.github.com>
Co-authored-by: Mark Grey <mgthesecond@spotify.com>
Co-authored-by: Farooq Qaiser <fqaiser94@gmail.com>
Co-authored-by: Scott Donnelly <scott@donnel.ly>
Co-authored-by: Shabana Baig <43451943+s-akhtar-baig@users.noreply.github.com>
Co-authored-by: stream2000 <18889897088@163.com>
Co-authored-by: fuqijun <qijun.fqj@alibaba-inc.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Marvin Lanhenke <62298609+marvinlanhenke@users.noreply.github.com>
Co-authored-by: mlanhenke <Marvin.Lanhenke@Berief-Food.de>
Co-authored-by: 张林伟 <lewiszlw520@gmail.com>
Co-authored-by: Alon Agmon <54080741+a-agmon@users.noreply.github.com>
Co-authored-by: TennyZhuang <zty0826@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Himadri Pal <mehimu@gmail.com>
Co-authored-by: hpal <hpal@apple.com>
Co-authored-by: Howie Wang <hongyiw@wepay.com>
Co-authored-by: QuakeWang <45645138+QuakeWang@users.noreply.github.com>
Co-authored-by: Ajay Gupte <139496877+gupteaj@users.noreply.github.com>
Co-authored-by: Christian <Christian.Thiel@outlook.com>
Co-authored-by: Matiukhin Vlad <87382371+rwwwx@users.noreply.github.com>
Co-authored-by: Jack <56563911+jdockerty@users.noreply.github.com>
Co-authored-by: Christian <christian@hansetag.com>
Co-authored-by: Vivek Khatri <vvk3785@gmail.com>
Co-authored-by: ZhangJian He <shoothzj@gmail.com>
Co-authored-by: Vipul Vaibhaw <vaibhaw.vipul@gmail.com>
Co-authored-by: thexia <37214832+thexiay@users.noreply.github.com>
Co-authored-by: thexiay <xiayu1187@gmail.com>
Co-authored-by: yinheli <me@yinheli.com>
Co-authored-by: tom <nooberfsh@gmail.com>
Co-authored-by: xxchan <xxchan22f@gmail.com>
Co-authored-by: Andre Luis Anastacio <ndrluis@proton.me>
Co-authored-by: Andre Luis Anastacio <andreluisanastacio@gmail.com>
Co-authored-by: Alex Yin <alexzeyin@gmail.com>
Co-authored-by: Alex Yin <alexyin@ibm.com>
Co-authored-by: SteveLauC <stevelauc@outlook.com>
Co-authored-by: Shirly <AndreMouche@126.com>
Co-authored-by: Callum Ryan <19956159+callum-ryan@users.noreply.github.com>
Co-authored-by: Tobias Pütz <puetztobias@gmail.com>
Co-authored-by: Matheus Alcantara <msalcantara.dev@pm.me>
Co-authored-by: Matheus Alcantara <mths.dev@pm.me>
Co-authored-by: FANNG <xiaojing@datastrato.com>
Co-authored-by: ChinoUkaegbu <77782533+ChinoUkaegbu@users.noreply.github.com>
Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
Co-authored-by: Timothy Maloney <sl1mb0@protonmail.com>
Co-authored-by: Timothy Maloney <tmaloney@influxdata.com>
Co-authored-by: Søren Dalby Larsen <sdlarsen@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants