-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
add lakekeeper integration test #25140
base: master
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
11de3df
to
5878f9b
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add lakekeeper integration test
The commit message should start with uppercase.
https://trino.io/development/process#pull-request-and-commit-guidelines-
@@ -585,6 +585,7 @@ public void testDropTableWithMissingMetadataFile() | |||
// try to drop table | |||
assertUpdate("DROP TABLE " + tableName); | |||
assertThat(getQueryRunner().tableExists(getSession(), tableName)).isFalse(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert unrelated changes.
@@ -226,6 +227,42 @@ public static void main(String[] args) | |||
} | |||
} | |||
|
|||
public static final class IcebergLakekeeperQueryRunnerMain { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow our codestyle: https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#code-style
|
||
} | ||
|
||
public void drop_without_purge(String schema, String table) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to dropTableWithoutPurge
.
@Override | ||
protected QueryRunner createQueryRunner() | ||
throws Exception { | ||
lakekeeperCatalog = closeAfterClass(new TestingLakekeeperCatalog("125ms")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use a different poll interval between TestIcebergLakekeeperCatalogConnectorSmokeTest
and IcebergLakekeeperQueryRunnerMain
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few tests that check that files are gone after dropping. Since Lakekeeper cleans up files async and returns a response before the clean-up is done, we need to sleep in the tests. The polling interval determines how long we have to sleep. A smaller poll interval means tests run faster but it also makes the Lakekeeper instance chattier. In IcebergLakekeeperQueryRunnerMain
, we dump the logs if an exception occurs, if we have a small poll interval there, the logs become quite unreadable, at the same time, we don't need the almost-instant cleanup in IcebergLakekeeperQueryRunnerMain
since there are no tests checking if files are actually gone.
String lakekeeperDbUrlReadKey = "LAKEKEEPER__PG_DATABASE_URL_READ"; | ||
String lakekeeperDbUrlWriteKey = "LAKEKEEPER__PG_DATABASE_URL_WRITE"; | ||
String lakekeeperPGEncryptionKey = "LAKEKEEPER__PG_ENCRYPTION_KEY"; | ||
String lakekeeperCatalogImage = "quay.io/lakekeeper/catalog:v0.7.0"; | ||
String pgEncryptionKey = "This-is-NOT-Secure!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these variables to fields as constants.
private final GenericContainer<?> migrator; | ||
private final PostgreSQLContainer<?> postgresqlContainer; | ||
private final GenericContainer<?> minio; | ||
private final Network net; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to network.
public static final String MINIO_ROOT_USER = "minio-root-user"; | ||
public static final String MINIO_ROOT_PASSWORD = "minio-root-password"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use same values as io.trino.testing.containers.Minio
:
public static final String MINIO_ROOT_USER = "minio-root-user"; | |
public static final String MINIO_ROOT_PASSWORD = "minio-root-password"; | |
public static final String MINIO_ROOT_USER = "accesskey"; | |
public static final String MINIO_ROOT_PASSWORD = "secretkey"; |
public String getLogs() { | ||
return lakekeeperCatalog.getLogs(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is currently used from IcebergLakekeeperQueryRunnerMain
. In case an exception occurs, we print the logs over there. We can remove the log-dump, then we can also remove this method altogether.
public int minioPort() { | ||
return this.minio.getMappedPort(MINIO_PORT); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could change to minioEndpoint
that returns "http://localhost:" + minio.getMappedPort(MINIO_PORT)
.
.withEnv(lakekeeperDbUrlWriteKey, pgConnectionString).withNetwork(postgresqlContainer.getNetwork()) | ||
.withCommand("migrate").start(); | ||
|
||
minio = new GenericContainer<>("bitnami/minio:latest"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using latest
tag. It leads to flaky tests.
postgresqlContainer = new PostgreSQLContainer<>("postgres:16"); | ||
postgresqlContainer.withNetwork(net).start(); | ||
|
||
String pgConnectionString = "postgresql://" + postgresqlContainer.getUsername() + ":" + postgresqlContainer.getPassword() + "@" + postgresqlContainer.getContainerName().substring(1) + ":" + PostgreSQLContainer.POSTGRESQL_PORT + "/" + postgresqlContainer.getDatabaseName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is too long.
@twuebi Could you explain why we should test Lakekeeper in this repository? We already test several REST catalog implementations (Unity, Polaris, S3 Tables and Tabular). I feel they are sufficient to ensure Iceberg REST catalog specification. |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
b3c4f5d
to
1ed80de
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
638ed47
to
11e8e26
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
I think the situation is similar to S3 but with the difference that the Iceberg REST Catalog spec is more formal. The fact remains that we want to ensure that as many different REST catalogs can show that they work with Trino and do so with every release. The current way to ensure that is to have them run tests here.. just like we do with minio and S3 and so on. We can discuss this more in the maintainer call since it would be good to have a better guideline. This work is also related to the fact that we want to mention official support from Trino for these projects and there is a web site PR to show this better. We also want to expand the docs for the catalogs and I am hoping @ebyhr @findinpath and others can help over time. The first steps are this PR and the website PR .. that still needs your technical review and input btw. trinodb/trino.io#770 More to come |
I think so too. It's a spec Lakekeeper should follow.
I recommend talking in Slack so other people can chime in. I personally don't want to support projects (#stars < 1k) unless there is a justification. |
11e8e26
to
7d3fdf3
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Description
This PR adds tests for the Lakekeeper Iceberg catalog.
Additional context and related issues
Lakekeeper does not support storing metadata on local file storage, this means we need to run against a full deployment, which includes minio, postgres and the catalog itself. While a bit more maintenance, it also means that these tests are close to what users will run. A few tests had to be disabled, each of the disabled tests should have a short explanation.
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.