Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

add lakekeeper integration test #25140

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

twuebi
Copy link

@twuebi twuebi commented Feb 25, 2025

Description

This PR adds tests for the Lakekeeper Iceberg catalog.

Additional context and related issues

Lakekeeper does not support storing metadata on local file storage, this means we need to run against a full deployment, which includes minio, postgres and the catalog itself. While a bit more maintenance, it also means that these tests are close to what users will run. A few tests had to be disabled, each of the disabled tests should have a short explanation.

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

Copy link

cla-bot bot commented Feb 25, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added the iceberg Iceberg connector label Feb 25, 2025
@twuebi twuebi force-pushed the tp/lakekeeper-tests branch from 11de3df to 5878f9b Compare February 25, 2025 16:38
Copy link

cla-bot bot commented Feb 25, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add lakekeeper integration test

The commit message should start with uppercase.
https://trino.io/development/process#pull-request-and-commit-guidelines-

@@ -585,6 +585,7 @@ public void testDropTableWithMissingMetadataFile()
// try to drop table
assertUpdate("DROP TABLE " + tableName);
assertThat(getQueryRunner().tableExists(getSession(), tableName)).isFalse();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert unrelated changes.

@@ -226,6 +227,42 @@ public static void main(String[] args)
}
}

public static final class IcebergLakekeeperQueryRunnerMain {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


}

public void drop_without_purge(String schema, String table) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to dropTableWithoutPurge.

@Override
protected QueryRunner createQueryRunner()
throws Exception {
lakekeeperCatalog = closeAfterClass(new TestingLakekeeperCatalog("125ms"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use a different poll interval between TestIcebergLakekeeperCatalogConnectorSmokeTest and IcebergLakekeeperQueryRunnerMain?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few tests that check that files are gone after dropping. Since Lakekeeper cleans up files async and returns a response before the clean-up is done, we need to sleep in the tests. The polling interval determines how long we have to sleep. A smaller poll interval means tests run faster but it also makes the Lakekeeper instance chattier. In IcebergLakekeeperQueryRunnerMain, we dump the logs if an exception occurs, if we have a small poll interval there, the logs become quite unreadable, at the same time, we don't need the almost-instant cleanup in IcebergLakekeeperQueryRunnerMain since there are no tests checking if files are actually gone.

Comment on lines 65 to 69
String lakekeeperDbUrlReadKey = "LAKEKEEPER__PG_DATABASE_URL_READ";
String lakekeeperDbUrlWriteKey = "LAKEKEEPER__PG_DATABASE_URL_WRITE";
String lakekeeperPGEncryptionKey = "LAKEKEEPER__PG_ENCRYPTION_KEY";
String lakekeeperCatalogImage = "quay.io/lakekeeper/catalog:v0.7.0";
String pgEncryptionKey = "This-is-NOT-Secure!";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these variables to fields as constants.

private final GenericContainer<?> migrator;
private final PostgreSQLContainer<?> postgresqlContainer;
private final GenericContainer<?> minio;
private final Network net;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to network.

Comment on lines 44 to 45
public static final String MINIO_ROOT_USER = "minio-root-user";
public static final String MINIO_ROOT_PASSWORD = "minio-root-password";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use same values as io.trino.testing.containers.Minio:

Suggested change
public static final String MINIO_ROOT_USER = "minio-root-user";
public static final String MINIO_ROOT_PASSWORD = "minio-root-password";
public static final String MINIO_ROOT_USER = "accesskey";
public static final String MINIO_ROOT_PASSWORD = "secretkey";

Comment on lines 126 to 140
public String getLogs() {
return lakekeeperCatalog.getLogs();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline this method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is currently used from IcebergLakekeeperQueryRunnerMain. In case an exception occurs, we print the logs over there. We can remove the log-dump, then we can also remove this method altogether.

Comment on lines 144 to 165
public int minioPort() {
return this.minio.getMappedPort(MINIO_PORT);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change to minioEndpoint that returns "http://localhost:" + minio.getMappedPort(MINIO_PORT).

.withEnv(lakekeeperDbUrlWriteKey, pgConnectionString).withNetwork(postgresqlContainer.getNetwork())
.withCommand("migrate").start();

minio = new GenericContainer<>("bitnami/minio:latest");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using latest tag. It leads to flaky tests.

postgresqlContainer = new PostgreSQLContainer<>("postgres:16");
postgresqlContainer.withNetwork(net).start();

String pgConnectionString = "postgresql://" + postgresqlContainer.getUsername() + ":" + postgresqlContainer.getPassword() + "@" + postgresqlContainer.getContainerName().substring(1) + ":" + PostgreSQLContainer.POSTGRESQL_PORT + "/" + postgresqlContainer.getDatabaseName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long.

@ebyhr
Copy link
Member

ebyhr commented Feb 26, 2025

@twuebi Could you explain why we should test Lakekeeper in this repository? We already test several REST catalog implementations (Unity, Polaris, S3 Tables and Tabular). I feel they are sufficient to ensure Iceberg REST catalog specification.

Copy link

cla-bot bot commented Feb 26, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@twuebi twuebi force-pushed the tp/lakekeeper-tests branch from b3c4f5d to 1ed80de Compare February 26, 2025 10:00
Copy link

cla-bot bot commented Feb 26, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Feb 26, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@twuebi twuebi force-pushed the tp/lakekeeper-tests branch from 638ed47 to 11e8e26 Compare February 26, 2025 11:03
Copy link

cla-bot bot commented Feb 26, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@mosabua
Copy link
Member

mosabua commented Feb 26, 2025

@twuebi Could you explain why we should test Lakekeeper in this repository? We already test several REST catalog implementations (Unity, Polaris, S3 Tables and Tabular). I feel they are sufficient to ensure Iceberg REST catalog specification.

I think the situation is similar to S3 but with the difference that the Iceberg REST Catalog spec is more formal. The fact remains that we want to ensure that as many different REST catalogs can show that they work with Trino and do so with every release. The current way to ensure that is to have them run tests here.. just like we do with minio and S3 and so on.

We can discuss this more in the maintainer call since it would be good to have a better guideline.

This work is also related to the fact that we want to mention official support from Trino for these projects and there is a web site PR to show this better. We also want to expand the docs for the catalogs and I am hoping @ebyhr @findinpath and others can help over time.

The first steps are this PR and the website PR .. that still needs your technical review and input btw. trinodb/trino.io#770

More to come

@ebyhr
Copy link
Member

ebyhr commented Feb 27, 2025

I think the situation is similar to S3 but with the difference that the Iceberg REST Catalog spec is more formal.

I think so too. It's a spec Lakekeeper should follow.
Note that even just adding tests increases our (≠their) maintenance costs.

We can discuss this more in the maintainer call since it would be good to have a better guideline.

I recommend talking in Slack so other people can chime in. I personally don't want to support projects (#stars < 1k) unless there is a justification.

Screenshot 2025-02-27 at 19 33 47

@twuebi twuebi force-pushed the tp/lakekeeper-tests branch from 11e8e26 to 7d3fdf3 Compare February 28, 2025 10:47
Copy link

cla-bot bot commented Feb 28, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

3 participants