Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: Add Catalog API #54

Merged
merged 15 commits into from
Sep 21, 2023
1 change: 1 addition & 0 deletions crates/iceberg/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ keywords = ["iceberg"]
[dependencies]
anyhow = "1.0.72"
apache-avro = "0.15"
async-trait = "0.1"
bimap = "0.6"
bitvec = "1.0.1"
chrono = "0.4"
Expand Down
149 changes: 149 additions & 0 deletions crates/iceberg/src/catalog.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! Catalog API for Apache Iceberg

use crate::spec::{PartitionSpec, Schema, SortOrder};
use crate::table::Table;
use crate::Result;
use async_trait::async_trait;
use std::collections::HashMap;

/// The catalog API for Iceberg Rust.
#[async_trait]
pub trait Catalog {
/// List namespaces from table.
async fn list_namespaces(&self, parent: Option<&NamespaceIdent>)
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
-> Result<Vec<NamespaceIdent>>;

/// Create a new namespace inside the catalog.
async fn create_namespace(
&self,
namespace: &NamespaceIdent,
properties: HashMap<String, String>,
) -> Result<Namespace>;

/// Get a namespace information from the catalog.
async fn get_namespace(&self, namespace: &NamespaceIdent) -> Result<Namespace>;

/// Update a namespace inside the catalog.
///
/// # Behavior
///
/// The properties must be the full set of namespace.
async fn update_namespace(
&self,
namespace: &NamespaceIdent,
properties: HashMap<String, String>,
) -> Result<()>;

/// Drop a namespace from the catalog.
async fn drop_namespace(&self, namespace: &NamespaceIdent) -> Result<()>;

/// List tables from namespace.
async fn list_tables(&self, namespace: &NamespaceIdent) -> Result<Vec<TableIdent>>;

/// Create a new table inside the namespace.
async fn create_table(
&self,
namespace: &NamespaceIdent,
creation: TableCreation,
) -> Result<Table>;

/// Load table from the catalog.
async fn load_table(&self, table: &TableIdent) -> Result<Table>;

/// Drop a table from the catalog.
async fn drop_table(&self, table: &TableIdent) -> Result<()>;

/// Check if a table exists in the catalog.
async fn stat_table(&self, table: &TableIdent) -> Result<bool>;

/// Rename a table in the catalog.
async fn rename_table(&self, src: &TableIdent, dest: &TableIdent) -> Result<()>;

/// Update a table to the catalog.
async fn update_table(&self, table: &TableIdent, commit: TableCommit) -> Result<Table>;

/// Update multiple tables to the catalog as an atomic operation.
async fn update_tables(&self, tables: &[(TableIdent, TableCommit)]) -> Result<()>;
}

/// NamespaceIdent represents the identifier of a namespace in the catalog.
pub struct NamespaceIdent(Vec<String>);

/// Namespace represents a namespace in the catalog.
pub struct Namespace {
name: NamespaceIdent,
properties: HashMap<String, String>,
}

/// TableIdent represents the identifier of a table in the catalog.
pub struct TableIdent {
namespace: NamespaceIdent,
name: String,
}

/// TableCreation represents the creation of a table in the catalog.
pub struct TableCreation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a rustee 🦀 , so forgive me if this is a silly question. Why would you create a struct for this, and not just have an argument to create_table for each of the fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more of a convention in Rust to pass values via structs.

Firstly, Rust structs have zero cost. Therefore, it is exactly the same for Rust to pass values via op.do(struct Abc {a, b}) or op.do(a, b). Additionally, unpacking a struct is also zero cost. Implementors can unpack the value from the struct when needed.

Considering all these reasons, I prefer to pass arguments in a struct to make it more readable and maintainable. This implementation will align with our design ideas:

  • Clean
  • Easy to undetstand (both for using and implementing)
  • Optimized for rust developers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, adding a new argument to a trait function is a breaking change. However, adding a new field to a struct can be compatible if it is given a default value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree that we should use struct as method arguments rather than several field to avoid breaking changes when we need to add more arguments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's better to provide a builder for argument?

Copy link
Member Author

@Xuanwo Xuanwo Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's better to provide a builder for argument?

Yep, I plan to leave them in following PRs. I believe they are not conflicts with this PR.

name: String,
location: String,
schema: Schema,
partition_spec: Option<PartitionSpec>,
sort_order: SortOrder,
properties: HashMap<String, String>,
}

/// TableCommit represents the commit of a table in the catalog.
pub struct TableCommit {
ident: TableIdent,
requirements: Vec<TableRequirement>,
updates: Vec<TableUpdate>,
}

/// TableRequirement represents a requirement for a table in the catalog.
pub enum TableRequirement {
/// The table must not already exist; used for create transactions
NotExist,
/// The table UUID must match the requirement.
UuidMatch(String),
/// The table branch or tag identified by the requirement's `reference` must
/// reference the requirement's `snapshot-id`.
RefSnapshotIdMatch {
/// The reference of the table to assert.
reference: String,
/// The snapshot id of the table to assert.
/// If the id is `None`, the ref must not already exist.
snapshot_id: Option<i64>,
},
/// The table's last assigned column id must match the requirement.
LastAssignedFieldIdMatch(i64),
/// The table's current schema id must match the requirement.
CurrentSchemaIdMatch(i64),
/// The table's last assigned partition id must match the
/// requirement.
LastAssignedPartitionIdMatch(i64),
/// The table's default spec id must match the requirement.
DefaultSpecIdMatch(i64),
/// The table's default sort order id must match the requirement.
DefaultSortOrderIdMatch(i64),
}

/// TableUpdate represents an update to a table in the catalog.
///
/// TODO: we should fill with UpgradeFormatVersionUpdate, AddSchemaUpdate and so on.
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
pub enum TableUpdate {}
7 changes: 7 additions & 0 deletions crates/iceberg/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,13 @@ pub use error::Error;
pub use error::ErrorKind;
pub use error::Result;

/// There is no implementation for this trait, allow dead code for now, should
/// be removed after we have one.
#[allow(dead_code)]
pub mod catalog;
#[allow(dead_code)]
pub mod table;

mod avro;
pub mod io;
pub mod spec;
2 changes: 1 addition & 1 deletion crates/iceberg/src/spec/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ pub struct SchemaBuilder {
}

impl SchemaBuilder {
/// Add fields to schem builder.
/// Add fields to schema builder.
pub fn with_fields(mut self, fields: impl IntoIterator<Item = NestedFieldRef>) -> Self {
self.fields.extend(fields);
self
Expand Down
26 changes: 26 additions & 0 deletions crates/iceberg/src/table.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! Table API for Apache Iceberg

use crate::spec::TableMetadata;

/// Table represents a table in the catalog.
pub struct Table {
metadata_location: String,
metadata: TableMetadata,
}
Loading