Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Make location in TableCreation optional #67

Open
Xuanwo opened this issue Sep 26, 2023 · 9 comments
Open

Make location in TableCreation optional #67

Xuanwo opened this issue Sep 26, 2023 · 9 comments
Assignees

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Sep 26, 2023

I'm not sure, but wouldn't it be more natural for Catalog to manage the location of the table and the table metadata itself?
If so, the location field of the TableCreation struct would better be deleted, or at least made Optional.
(For now, I'm thinking about ignoring that field when test-implementing the function.)

Thanks.

Originally posted by @zeodtr in #54 (comment)

@Xuanwo
Copy link
Member Author

Xuanwo commented Sep 26, 2023

What do you think? cc @Fokko @liurenjie1024 @JanKaul

@JanKaul
Copy link
Collaborator

JanKaul commented Sep 26, 2023

Did I understand correctly that the location field in the TableCreation struct relates to the location field in the table-metadata?

If so, I would manage the location as part of the table metadata. If we aim to create v2 iceberg tables by default, I think the field should not be optional. It we want to enable the creation of v1 tables we could make it optional.

@JanKaul
Copy link
Collaborator

JanKaul commented Sep 26, 2023

@zeodtr What kind of catalog are you trying to implement?

@Xuanwo
Copy link
Member Author

Xuanwo commented Sep 26, 2023

Did I understand correctly that the location field in the TableCreation struct relates to the location field in the table-metadata?

Do I understand correctly that @zeodtr's idea is to have the location field in the table-metadata managed by the catalog, making it optional for the catalog to determine the actual location?

@zeodtr
Copy link

zeodtr commented Sep 26, 2023

@zeodtr What kind of catalog are you trying to implement?

What I'm thinking is as explained by @Xuanwo:

Do I understand correctly that @zeodtr's idea is to have the location field in the table-metadata managed by the catalog, making it optional for the catalog to determine the actual location?

I'm trying to implement a Catalog that manages the table and table metadata by itself, including where to store/retrieve them.
So, the Catalog wants to determine the location by itself, rather than take it as an argument.
(Edited: 'itself' to 'by itself')

@JanKaul
Copy link
Collaborator

JanKaul commented Sep 26, 2023

Ah okay, now I get it. Thanks for the explanation.

I'm not really sure what the implications of this are. I think the iceberg crate still has to assume that the location is not managed by the catalog in general.

@liurenjie1024
Copy link
Collaborator

I think by the openapi spec also says location field is optional:
https://github.com/apache/iceberg/blob/621d301b17417af4b71a0e8c93db9db6f8266839/open-api/rest-catalog-open-api.yaml#L1948

And it's reasonable to make it optional since a catalog may decide the directory layout of all tables by itself.

@Fokko
Copy link
Contributor

Fokko commented Sep 27, 2023

I'm not really sure what the implications of this are. I think the iceberg crate still has to assume that the location is not managed by the catalog in general.

I think this comment by @JanKaul summarizes the issue. The answer to the question is: it depends.

For the REST catalog, the metadata is managed by the catalog. For the rest (Hive, Glue, Dynamo, SQL etc) the metadata is managed by the crate. I think it is reasonable to make it optional.

In PyIceberg it is still required, but that's mostly since we haven't seen any situation where it was missing.

@Xuanwo Xuanwo self-assigned this Oct 13, 2023
@Xuanwo
Copy link
Member Author

Xuanwo commented Oct 13, 2023

I'm going to make this change.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants