Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: Update database permissions in async mode #32231

Merged
merged 31 commits into from
Mar 1, 2025

Conversation

Vitor-Avila
Copy link
Contributor

@Vitor-Avila Vitor-Avila commented Feb 12, 2025

SUMMARY

Note: To avoid nested transactions, this PR depends on: #32401

Currently, in order to make a new schema or catalog available to be used in Roles, users would need to perform a dummy update to the DB connection (which triggers a permission re-sync). Also, this re-sync is executed in synchronous mode, meaning that the DB connection dialog (and the API request) are kept open until it's fully processed.

This is not ideal for a few different reasons:

  • The DB connection is actually already updated -- the re-sync happens as a consequence of the update.
  • The DB credentials might have access to several catalogs (with countless schemas each), which would take a considerable time.
  • The API request is on hold until the operation finishes, holding un-necessary resources.

This PR moves this logic to dedicated commands, supporting the sync mode approach (default/current behavior) and also an async mode (delegating the operation to a celery task). A dedicated endpoint is also added, allowing users to trigger a re-sync without having to edit the connection.

I'm planning on doing a follow up specifically for the update DB connection flow, but decided to have a first PR mostly focused on adding the feature.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

After
image

TESTING INSTRUCTIONS

Tests added.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the api Related to the REST API label Feb 12, 2025
@dosubot dosubot bot added change:backend Requires changing the backend data:databases Related to database configurations and connections labels Feb 12, 2025

logger = logging.getLogger(__name__)


def ping(engine: Engine) -> bool:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic moved from superset/commands/database/test_connection.py.

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Fix Detected
Error Handling Silent Error Handling in Database Ping ▹ view
Performance N+1 Query Pattern in Permission Updates ▹ view
Error Handling Inconsistent Catalog Connection Error Handling ▹ view
Error Handling Incomplete Error Handling in Permission Resync ▹ view
Security Unsafe User Impersonation in Celery Task ▹ view
Error Handling Unsafe User Existence Check ▹ view
Logging Misspelled Error Message ▹ view
Functionality Default Async Mode Contradicts Intent ▹ view
Files scanned
File Path Reviewed
superset/commands/database/utils.py
superset/tasks/permissions.py
superset/constants.py
superset/commands/database/resync_permissions_async.py
superset/commands/database/exceptions.py
superset/commands/database/update.py
superset/views/base.py
superset/commands/database/resync_permissions.py
superset-frontend/src/pages/DatabaseList/index.tsx
superset/databases/api.py
superset/config.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

  • You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.
  • You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.
  • Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.
  • Chat with Korbit on issues we post by tagging @korbit-ai in your reply.
  • Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

  • Check out our docs on how you can make Korbit work best for you and your team.
  • Customize Korbit for your organization through the Korbit Console.

Feedback and Support

Comment on lines 267 to 276
for dataset in DatabaseDAO.get_datasets(
self.db_connection_id,
catalog=catalog,
schema=schema,
):
dataset.catalog_perm = new_catalog_perm_name
dataset.schema_perm = new_schema_perm_name
for chart in DatasetDAO.get_related_objects(dataset.id)["charts"]:
chart.catalog_perm = new_catalog_perm_name
chart.schema_perm = new_schema_perm_name
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N+1 Query Pattern in Permission Updates category Performance

Tell me more
What is the issue?

Nested loops performing database queries for each dataset and its related charts, without batching or bulk updates.

Why this matters

This can lead to N+1 query performance issues when dealing with large numbers of datasets and charts, causing excessive database round trips.

Suggested change ∙ Feature Preview

Batch the permission updates using bulk database operations. Consider using a bulk update query or collecting all objects first and then updating them in a single transaction:

datasets = DatabaseDAO.get_datasets(self.db_connection_id, catalog=catalog, schema=schema)
dataset_ids = [d.id for d in datasets]

# Bulk update datasets
DatasetDAO.bulk_update_permissions(dataset_ids, 
    catalog_perm=new_catalog_perm_name,
    schema_perm=new_schema_perm_name)

# Bulk update related charts
DatasetDAO.bulk_update_chart_permissions(dataset_ids,
    catalog_perm=new_catalog_perm_name,
    schema_perm=new_schema_perm_name)

Report a problem with this comment

💬 Chat with Korbit by mentioning @korbit-ai.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a huge improvement, @Vitor-Avila! 🌮🌮🌮🌮🌮

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few questions about the async mode.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@@ -88,7 +87,14 @@ def run(self) -> Model:
database.set_sqlalchemy_uri(database.sqlalchemy_uri)
ssh_tunnel = self._handle_ssh_tunnel(database)
try:
self._refresh_catalogs(database, original_database_name, ssh_tunnel)
current_username = get_username()
SyncPermissionsCommand(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much better!

@@ -106,3 +107,35 @@ def data_loader(
return PandasDataLoader(
example_db_engine, pandas_loader_configuration, table_to_df_convertor
)


def with_config(override_config: dict[str, Any]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@Vitor-Avila Vitor-Avila force-pushed the feat/async-db-perm-sync branch from f394d86 to 9fe2c74 Compare February 28, 2025 15:03
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

old_db_connection_name=old_db_connection_name,
db_connection=db_connection,
ssh_tunnel=ssh_tunnel,
).sync_database_permissions()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we're calling a specific method of the command instead of .run() makes me think that this should've been a function instead of a method (since technically we always want to call only .run() when interacting with a command). But this is fine for now, it's just a nit.

@Vitor-Avila Vitor-Avila merged commit d79f7b2 into master Mar 1, 2025
48 checks passed
@Vitor-Avila Vitor-Avila deleted the feat/async-db-perm-sync branch March 1, 2025 00:25
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
api Related to the REST API change:backend Requires changing the backend data:databases Related to database configurations and connections size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants