feat: Add Catalog API #54

Xuanwo · 2023-09-01T04:39:59Z

This is a draft of catalog API to show my general ideas:

I left the complex TableUpdate for further PRs.
All struct only have definition and no function impemented.
The detailed behavior is not added yet.
This design is mainly modeled by Iceberg REST API with some my own understanding

Once this draft has been approved, we can start fill it will real implementations.

Design Idea

Clean
Easy to undetstand (both for using and implementing)
Optimized for rust developers
Async First

Signed-off-by: Xuanwo <github@xuanwo.io>

Xuanwo · 2023-09-01T04:49:08Z

cc @liurenjie1024 @JanKaul @ZENOTME @Fokko, would you like to take a look?

crates/iceberg/src/catalog.rs

liurenjie1024

Generally LGTM, left small suggestions.

crates/iceberg/src/catalog.rs

Signed-off-by: Xuanwo <github@xuanwo.io>

liurenjie1024

LGTM

JanKaul · 2023-09-01T08:44:03Z

We haven't talked about Iceberg Views yet and I'm not sure if you would like to support them in the future. I'm actually really interested in using views.

I don't want to include view support in this PR but I would like to future proof the catalog design to easily support views in the future.

The REST catalog is in the process of adding view support. However, I have the feeling that the initial REST catalog API wasn't designed with views in mind. There is one issue that comes up with the current design that we might be able to avoid for our catalog design. I will try to explain the issue.

Imagine a query engine wants to perform a query like the following:

SELECT first_name, last_name, age FROM users WHERE age > 18;

Without the catalog information the query engine doesn't know whether users refers to a view or a table. So if the catalog only exposes a load_table and load_view operation, the query engine doesn't know which one to call. I would prefer a load_tablelike (the naming is not important) operation that returns an

enum TableLike {
    Table(Table),
    View(View)
}

All iceberg catalogs except for the REST catalog return a "table metadata location" and it would be easy to distinguish between Tables and Views based on the metadata.

Let me know what you think.

Xuanwo · 2023-09-01T08:55:58Z

Thank you for sharing! The relationship between view and table is quite interesting. At Databend, we consider a view as a special type of table, and our engine can distinguish the differences and determine how to execute the SQL accordingly.

I'm wondering if it's possible to avoid exposing the view details at the Catalog level and instead leave them for the catalog implementer to decide.

JanKaul · 2023-09-01T08:57:04Z

Converting from Table to TableLike is straightforward:

impl From<Table> for TableLike {
    fn from(value: Table) -> Self {
        TableLike::Table(value)
    }
}

So you would only need to call table.into().

For the table scan operation you would need to check whether your input is a table or a view and then execute the corresponding scan operator. As long as we don't have view support it could look like:

let table = if let TableLike::Table(table) = tablelike {
        Ok(table)
    } else {
        Err(Error::new(... , "Views are not supported yet"))
    }?;

Xuanwo · 2023-09-01T09:00:07Z

Hi, @JanKaul, I started an issue about this topic: #55

liurenjie1024 · 2023-09-01T09:21:07Z

Hi, @JanKaul Good point for view. Except for view, I think there will be other entities like materialized view. So the question is about name resolution:

Should we keep separate method for resolving tables, views, event materialized view?

fn load_table(&self, table_name: TableId) -> Result<Table>;
fn load_view(&self, view_name: ViewId) -> Result<View>;

I think it's still useful to keep separate methods for each kind of entity. For example when I want to execute dml like:

insert into t values(1), (2)

The query engine knows that it should load a table to check it.

Should we have a method for resolving entities without knowing its entity type?

As @JanKaul said it's would be quite useful in query:

select * from t;

In this case the catalog api should provide sth like:

fn load_entity(&self, table: &TableId) -> Result<TableLike>

Should we keep both methods or just load_entity?

Keep load_entity method only can solve problem in 1. But I still want to keep methods in 1 for several reasons:

The api is more user friendly.
This gives concrete implementation more chances to do optimization, especially in cases where many tables and views are defined.

Xuanwo · 2023-09-01T09:25:08Z

Hi @liurenjie1024, I apologize for the confusion. I have redirected this discussion to #55.

crates/iceberg/src/spec/schema.rs

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

Fokko · 2023-09-13T08:45:07Z

crates/iceberg/src/catalog.rs

+}
+
+/// TableCreation represents the creation of a table in the catalog.
+pub struct TableCreation {


I'm not a rustee 🦀 , so forgive me if this is a silly question. Why would you create a struct for this, and not just have an argument to create_table for each of the fields?

It's more of a convention in Rust to pass values via structs.

Firstly, Rust structs have zero cost. Therefore, it is exactly the same for Rust to pass values via op.do(struct Abc {a, b}) or op.do(a, b). Additionally, unpacking a struct is also zero cost. Implementors can unpack the value from the struct when needed.

Considering all these reasons, I prefer to pass arguments in a struct to make it more readable and maintainable. This implementation will align with our design ideas:

Clean

Easy to undetstand (both for using and implementing)

Optimized for rust developers

By the way, adding a new argument to a trait function is a breaking change. However, adding a new field to a struct can be compatible if it is given a default value.

I also agree that we should use struct as method arguments rather than several field to avoid breaking changes when we need to add more arguments.

But it's better to provide a builder for argument?

But it's better to provide a builder for argument?

Yep, I plan to leave them in following PRs. I believe they are not conflicts with this PR.

crates/iceberg/src/catalog.rs

crates/iceberg/src/table.rs

crates/iceberg/src/catalog.rs

Signed-off-by: Xuanwo <github@xuanwo.io>

Co-authored-by: Fokko Driesprong <fokko@apache.org>

…to add-catalog-api

Signed-off-by: Xuanwo <github@xuanwo.io>

crates/iceberg/src/table.rs

Signed-off-by: Xuanwo <github@xuanwo.io>

Fokko · 2023-09-21T08:29:57Z

Thanks for working on this @Xuanwo and @JanKaul, @liurenjie1024 and @ZENOTME for the review 🙌

zeodtr · 2023-09-22T07:19:47Z

@Xuanwo Hi,
Maybe it's too early, but I am trying to implement the Catalog API provided with this merge for my Iceberg-related Rust project.
But visibility issues prevent me from doing it.

For example, NamespaceIdent's field is private, so I cannot create a value of that type. In fact, almost all structs in catalog.rs have this visibility issue.

Since this merge is a draft, I can assume that catalog.rs is incomplete yet.
What I want to know is, how will you resolve this visibility issue so that I can pre-modify the source code for the (test-) implementation.

I guess there are two ways (BTW, I'm not fluent with Rust):

Add public new functions and accessor functions.
Make the fields public.
I think that for trivial structs like the ones in catalog.rs it would be simpler to go to the way 2.

What is your plan?
Or, maybe I misunderstood something fundamental...

Thank you in advance.

Xuanwo · 2023-09-22T07:28:55Z

Since this merge is a draft, I can assume that catalog.rs is incomplete yet.

Yep, this PR is just to discuss the API with community first.

Add public new functions and accessor functions.

Make the fields public.
I think that for trivial structs like the ones in catalog.rs it would be simpler to go to the way 2.

I believe it is safe to make the fields public as users construct them directly. I will submit a PR for this change to be reviewed in due course. Feel free to join the discussion.

zeodtr · 2023-09-26T06:17:29Z

@Xuanwo Hi,
I'm not sure, but wouldn't it be more natural for Catalog to manage the location of the table and the table metadata itself?
If so, the location field of the TableCreation struct would better be deleted, or at least made Optional.
(For now, I'm thinking about ignoring that field when test-implementing the function.)

Thanks.

Xuanwo · 2023-09-26T06:24:21Z

at least made Optional.

Sounds like a good idea to me.

However, this PR has already been merged. What do you think about creating a new issue to initiate a discussion? I believe we can refine the catalog API in subsequent PRs.

zeodtr · 2023-09-26T07:14:11Z

at least made Optional.

Sounds like a good idea to me.

However, this PR has already been merged. What do you think about creating a new issue to initiate a discussion? I believe we can refine the catalog API in subsequent PRs.

It would be nice to create a new issue for this.
But the scope of the issue should be decided. Should it be only for the location field or more general (for example, including member visibility issue)?
Perhaps you can decide the scope and create a new issue for that.

Thanks.

feat: Add Catalog API

fd7f733

Signed-off-by: Xuanwo <github@xuanwo.io>

Xuanwo mentioned this pull request Sep 1, 2023

feat: Define catalog api icelake-io/icelake#171

Merged

ZENOTME reviewed Sep 1, 2023

View reviewed changes

crates/iceberg/src/catalog.rs Outdated Show resolved Hide resolved

liurenjie1024 reviewed Sep 1, 2023

View reviewed changes

Xuanwo added 5 commits September 1, 2023 16:11

remove get config

b57521b

Signed-off-by: Xuanwo <github@xuanwo.io>

Fix naming

921ce80

Signed-off-by: Xuanwo <github@xuanwo.io>

Use ref instead

7b7243c

Signed-off-by: Xuanwo <github@xuanwo.io>

Move table out

7436b97

Signed-off-by: Xuanwo <github@xuanwo.io>

Fix typo

f2ac4d6

Signed-off-by: Xuanwo <github@xuanwo.io>

liurenjie1024 approved these changes Sep 1, 2023

View reviewed changes

Xuanwo mentioned this pull request Sep 1, 2023

Iceberg View Support #55

Open

liurenjie1024 reviewed Sep 4, 2023

View reviewed changes

crates/iceberg/src/spec/schema.rs Outdated Show resolved Hide resolved

Update crates/iceberg/src/spec/schema.rs

d4c4e45

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

Fokko reviewed Sep 13, 2023

View reviewed changes

crates/iceberg/src/catalog.rs Outdated Show resolved Hide resolved

Fokko reviewed Sep 13, 2023

View reviewed changes

crates/iceberg/src/catalog.rs Show resolved Hide resolved

Fokko reviewed Sep 13, 2023

View reviewed changes

crates/iceberg/src/table.rs Outdated Show resolved Hide resolved

liurenjie1024 reviewed Sep 13, 2023

View reviewed changes

crates/iceberg/src/catalog.rs Show resolved Hide resolved

Xuanwo and others added 5 commits September 14, 2023 12:25

Make partition_spec optional

64aedca

Signed-off-by: Xuanwo <github@xuanwo.io>

Update crates/iceberg/src/table.rs

5cd5324

Co-authored-by: Fokko Driesprong <fokko@apache.org>

Merge remote-tracking branch 'refs/remotes/xuanwo/add-catalog-api' in…

36abbc6

…to add-catalog-api

Merge branch 'main' into add-catalog-api

c4e4d5b

Fix sort

a8e1bb6

Signed-off-by: Xuanwo <github@xuanwo.io>

liurenjie1024 reviewed Sep 20, 2023

View reviewed changes

crates/iceberg/src/table.rs Outdated Show resolved Hide resolved

liurenjie1024 mentioned this pull request Sep 20, 2023

Introduce table api. #59

Closed

Xuanwo added 3 commits September 20, 2023 22:00

Remove config

0d3c936

Signed-off-by: Xuanwo <github@xuanwo.io>

Merge remote-tracking branch 'origin/main' into add-catalog-api

8a81da7

Make clippy happy

56c3cb7

Signed-off-by: Xuanwo <github@xuanwo.io>

Fokko approved these changes Sep 21, 2023

View reviewed changes

Fokko merged commit 13281d3 into apache:main Sep 21, 2023
6 checks passed

Xuanwo deleted the add-catalog-api branch September 21, 2023 08:35

liurenjie1024 mentioned this pull request Sep 21, 2023

Implement rest catalog. #60

Closed

zeodtr mentioned this pull request Sep 22, 2023

feat: add builder to TableMetadata interface #62

Closed

Xuanwo mentioned this pull request Sep 26, 2023

Make location in TableCreation optional #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Catalog API #54

feat: Add Catalog API #54

Xuanwo commented Sep 1, 2023 •

edited

Loading

Xuanwo commented Sep 1, 2023

liurenjie1024 left a comment

liurenjie1024 left a comment

JanKaul commented Sep 1, 2023 •

edited

Loading

Xuanwo commented Sep 1, 2023 •

edited

Loading

JanKaul commented Sep 1, 2023 •

edited

Loading

Xuanwo commented Sep 1, 2023 •

edited

Loading

liurenjie1024 commented Sep 1, 2023

Xuanwo commented Sep 1, 2023

Fokko Sep 13, 2023

Xuanwo Sep 14, 2023

Xuanwo Sep 14, 2023

liurenjie1024 Sep 20, 2023

liurenjie1024 Sep 20, 2023

Xuanwo Sep 20, 2023 •

edited

Loading

Fokko commented Sep 21, 2023

zeodtr commented Sep 22, 2023

Xuanwo commented Sep 22, 2023

zeodtr commented Sep 26, 2023 •

edited

Loading

Xuanwo commented Sep 26, 2023

zeodtr commented Sep 26, 2023 •

edited

Loading

feat: Add Catalog API #54

feat: Add Catalog API #54

Conversation

Xuanwo commented Sep 1, 2023 • edited Loading

Design Idea

Xuanwo commented Sep 1, 2023

liurenjie1024 left a comment

Choose a reason for hiding this comment

liurenjie1024 left a comment

Choose a reason for hiding this comment

JanKaul commented Sep 1, 2023 • edited Loading

Xuanwo commented Sep 1, 2023 • edited Loading

JanKaul commented Sep 1, 2023 • edited Loading

Xuanwo commented Sep 1, 2023 • edited Loading

liurenjie1024 commented Sep 1, 2023

Xuanwo commented Sep 1, 2023

Fokko Sep 13, 2023

Choose a reason for hiding this comment

Xuanwo Sep 14, 2023

Choose a reason for hiding this comment

Xuanwo Sep 14, 2023

Choose a reason for hiding this comment

liurenjie1024 Sep 20, 2023

Choose a reason for hiding this comment

liurenjie1024 Sep 20, 2023

Choose a reason for hiding this comment

Xuanwo Sep 20, 2023 • edited Loading

Choose a reason for hiding this comment

Fokko commented Sep 21, 2023

zeodtr commented Sep 22, 2023

Xuanwo commented Sep 22, 2023

zeodtr commented Sep 26, 2023 • edited Loading

Xuanwo commented Sep 26, 2023

zeodtr commented Sep 26, 2023 • edited Loading

Xuanwo commented Sep 1, 2023 •

edited

Loading

JanKaul commented Sep 1, 2023 •

edited

Loading

Xuanwo commented Sep 1, 2023 •

edited

Loading

JanKaul commented Sep 1, 2023 •

edited

Loading

Xuanwo commented Sep 1, 2023 •

edited

Loading

Xuanwo Sep 20, 2023 •

edited

Loading

zeodtr commented Sep 26, 2023 •

edited

Loading

zeodtr commented Sep 26, 2023 •

edited

Loading