-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: Add Catalog API #54
Conversation
Signed-off-by: Xuanwo <github@xuanwo.io>
cc @liurenjie1024 @JanKaul @ZENOTME @Fokko, would you like to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, left small suggestions.
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We haven't talked about Iceberg Views yet and I'm not sure if you would like to support them in the future. I'm actually really interested in using views. I don't want to include view support in this PR but I would like to future proof the catalog design to easily support views in the future. The REST catalog is in the process of adding view support. However, I have the feeling that the initial REST catalog API wasn't designed with views in mind. There is one issue that comes up with the current design that we might be able to avoid for our catalog design. I will try to explain the issue. Imagine a query engine wants to perform a query like the following: SELECT first_name, last_name, age FROM users WHERE age > 18; Without the catalog information the query engine doesn't know whether enum TableLike {
Table(Table),
View(View)
} All iceberg catalogs except for the REST catalog return a "table metadata location" and it would be easy to distinguish between Tables and Views based on the metadata. Let me know what you think. |
Thank you for sharing! The relationship between I'm wondering if it's possible to avoid exposing the |
Converting from Table to TableLike is straightforward: impl From<Table> for TableLike {
fn from(value: Table) -> Self {
TableLike::Table(value)
}
} So you would only need to call For the table scan operation you would need to check whether your input is a table or a view and then execute the corresponding scan operator. As long as we don't have view support it could look like: let table = if let TableLike::Table(table) = tablelike {
Ok(table)
} else {
Err(Error::new(... , "Views are not supported yet"))
}?; |
Hi, @JanKaul Good point for view. Except for view, I think there will be other entities like materialized view. So the question is about name resolution:
fn load_table(&self, table_name: TableId) -> Result<Table>;
fn load_view(&self, view_name: ViewId) -> Result<View>; I think it's still useful to keep separate methods for each kind of entity. For example when I want to execute dml like:
The query engine knows that it should load a table to check it.
As @JanKaul said it's would be quite useful in query: select * from t; In this case the catalog api should provide sth like: fn load_entity(&self, table: &TableId) -> Result<TableLike>
Keep
|
Hi @liurenjie1024, I apologize for the confusion. I have redirected this discussion to #55. |
Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>
} | ||
|
||
/// TableCreation represents the creation of a table in the catalog. | ||
pub struct TableCreation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a rustee 🦀 , so forgive me if this is a silly question. Why would you create a struct for this, and not just have an argument to create_table
for each of the fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more of a convention in Rust to pass values via structs.
Firstly, Rust structs have zero cost. Therefore, it is exactly the same for Rust to pass values via op.do(struct Abc {a, b})
or op.do(a, b)
. Additionally, unpacking a struct is also zero cost. Implementors can unpack the value from the struct when needed.
Considering all these reasons, I prefer to pass arguments in a struct to make it more readable and maintainable. This implementation will align with our design ideas:
- Clean
- Easy to undetstand (both for using and implementing)
- Optimized for rust developers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, adding a new argument to a trait function is a breaking change. However, adding a new field to a struct can be compatible if it is given a default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree that we should use struct as method arguments rather than several field to avoid breaking changes when we need to add more arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's better to provide a builder for argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's better to provide a builder for argument?
Yep, I plan to leave them in following PRs. I believe they are not conflicts with this PR.
Signed-off-by: Xuanwo <github@xuanwo.io>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
…to add-catalog-api
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Thanks for working on this @Xuanwo and @JanKaul, @liurenjie1024 and @ZENOTME for the review 🙌 |
@Xuanwo Hi, For example, NamespaceIdent's field is private, so I cannot create a value of that type. In fact, almost all structs in Since this merge is a draft, I can assume that I guess there are two ways (BTW, I'm not fluent with Rust):
What is your plan? Thank you in advance. |
Yep, this PR is just to discuss the API with community first.
I believe it is safe to make the fields public as users construct them directly. I will submit a PR for this change to be reviewed in due course. Feel free to join the discussion. |
@Xuanwo Hi, Thanks. |
Sounds like a good idea to me. However, this PR has already been merged. What do you think about creating a new issue to initiate a discussion? I believe we can refine the catalog API in subsequent PRs. |
It would be nice to create a new issue for this. Thanks. |
This is a draft of catalog API to show my general ideas:
TableUpdate
for further PRs.Once this draft has been approved, we can start fill it will real implementations.
Design Idea