-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: Update database permissions in async mode #32231
Conversation
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
def ping(engine: Engine) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic moved from superset/commands/database/test_connection.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Fix Detected |
---|---|---|
Silent Error Handling in Database Ping ▹ view | ||
N+1 Query Pattern in Permission Updates ▹ view | ||
Inconsistent Catalog Connection Error Handling ▹ view | ||
Incomplete Error Handling in Permission Resync ▹ view | ||
Unsafe User Impersonation in Celery Task ▹ view | ||
Unsafe User Existence Check ▹ view | ✅ | |
Misspelled Error Message ▹ view | ||
Default Async Mode Contradicts Intent ▹ view |
Files scanned
File Path | Reviewed |
---|---|
superset/commands/database/utils.py | ✅ |
superset/tasks/permissions.py | ✅ |
superset/constants.py | ✅ |
superset/commands/database/resync_permissions_async.py | ✅ |
superset/commands/database/exceptions.py | ✅ |
superset/commands/database/update.py | ✅ |
superset/views/base.py | ✅ |
superset/commands/database/resync_permissions.py | ✅ |
superset-frontend/src/pages/DatabaseList/index.tsx | ✅ |
superset/databases/api.py | ✅ |
superset/config.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-review
on this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-review
command in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-description
command in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolve
command in any comment on your PR.- Chat with Korbit on issues we post by tagging @korbit-ai in your reply.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Feedback and Support
for dataset in DatabaseDAO.get_datasets( | ||
self.db_connection_id, | ||
catalog=catalog, | ||
schema=schema, | ||
): | ||
dataset.catalog_perm = new_catalog_perm_name | ||
dataset.schema_perm = new_schema_perm_name | ||
for chart in DatasetDAO.get_related_objects(dataset.id)["charts"]: | ||
chart.catalog_perm = new_catalog_perm_name | ||
chart.schema_perm = new_schema_perm_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N+1 Query Pattern in Permission Updates 
Tell me more
What is the issue?
Nested loops performing database queries for each dataset and its related charts, without batching or bulk updates.
Why this matters
This can lead to N+1 query performance issues when dealing with large numbers of datasets and charts, causing excessive database round trips.
Suggested change ∙ Feature Preview
Batch the permission updates using bulk database operations. Consider using a bulk update query or collecting all objects first and then updating them in a single transaction:
datasets = DatabaseDAO.get_datasets(self.db_connection_id, catalog=catalog, schema=schema)
dataset_ids = [d.id for d in datasets]
# Bulk update datasets
DatasetDAO.bulk_update_permissions(dataset_ids,
catalog_perm=new_catalog_perm_name,
schema_perm=new_schema_perm_name)
# Bulk update related charts
DatasetDAO.bulk_update_chart_permissions(dataset_ids,
catalog_perm=new_catalog_perm_name,
schema_perm=new_schema_perm_name)
💬 Chat with Korbit by mentioning @korbit-ai.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a huge improvement, @Vitor-Avila! 🌮🌮🌮🌮🌮
848482a
to
2d60c32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few questions about the async mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
@@ -88,7 +87,14 @@ def run(self) -> Model: | |||
database.set_sqlalchemy_uri(database.sqlalchemy_uri) | |||
ssh_tunnel = self._handle_ssh_tunnel(database) | |||
try: | |||
self._refresh_catalogs(database, original_database_name, ssh_tunnel) | |||
current_username = get_username() | |||
SyncPermissionsCommand( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much better!
@@ -106,3 +107,35 @@ def data_loader( | |||
return PandasDataLoader( | |||
example_db_engine, pandas_loader_configuration, table_to_df_convertor | |||
) | |||
|
|||
|
|||
def with_config(override_config: dict[str, Any]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
f394d86
to
9fe2c74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!
old_db_connection_name=old_db_connection_name, | ||
db_connection=db_connection, | ||
ssh_tunnel=ssh_tunnel, | ||
).sync_database_permissions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we're calling a specific method of the command instead of .run()
makes me think that this should've been a function instead of a method (since technically we always want to call only .run()
when interacting with a command). But this is fine for now, it's just a nit.
SUMMARY
Note: To avoid nested transactions, this PR depends on: #32401
Currently, in order to make a new schema or catalog available to be used in Roles, users would need to perform a dummy update to the DB connection (which triggers a permission re-sync). Also, this re-sync is executed in synchronous mode, meaning that the DB connection dialog (and the API request) are kept open until it's fully processed.
This is not ideal for a few different reasons:
This PR moves this logic to dedicated commands, supporting the sync mode approach (default/current behavior) and also an async mode (delegating the operation to a celery task). A dedicated endpoint is also added, allowing users to trigger a re-sync without having to edit the connection.
I'm planning on doing a follow up specifically for the update DB connection flow, but decided to have a first PR mostly focused on adding the feature.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
After

TESTING INSTRUCTIONS
Tests added.
ADDITIONAL INFORMATION