-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
chore(datasets): Add /api/v1/dataset/{pk}/column endpoint #22332
chore(datasets): Add /api/v1/dataset/{pk}/column endpoint #22332
Conversation
@@ -257,7 +257,7 @@ | |||
"AnnotationLayerRestApi.get_list": { | |||
"properties": { | |||
"changed_by": { | |||
"$ref": "#/components/schemas/AnnotationLayerRestApi.get_list.User1" | |||
"$ref": "#/components/schemas/AnnotationLayerRestApi.get_list.User" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regenerated by running superset update-api-docs
.
f4c2c43
to
a4aa974
Compare
@@ -36,14 +38,99 @@ | |||
class DatasetColumnsRestApi(BaseSupersetModelRestApi): | |||
datamodel = SQLAInterface(TableColumn) | |||
|
|||
include_route_methods = {"delete"} | |||
allow_browser_login = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably check and move allow_browser_login = True
into BaseSupersetModelRestApi
since it seems it's used everywhere, I'll make a PR for it
a4aa974
to
939d13e
Compare
Codecov Report
@@ Coverage Diff @@
## master #22332 +/- ##
==========================================
- Coverage 66.75% 66.11% -0.65%
==========================================
Files 1847 1847
Lines 70562 70577 +15
Branches 7742 7737 -5
==========================================
- Hits 47106 46664 -442
- Misses 21455 21906 +451
- Partials 2001 2007 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
939d13e
to
f8d36c7
Compare
Thanks for taking this performance issue under your wing. I'm hopeful this can mean other orgs with this issue can take this new filters feature to production, and move us another step toward 3.0 and the end of Filter Boxes :D |
7684198
to
f1bd7f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why cypress is failing but the change makes sense to me
@@ -98,17 +98,17 @@ export function ColumnSelect({ | |||
} | |||
if (datasetId != null) { | |||
cachedSupersetGet({ | |||
endpoint: `/api/v1/dataset/${datasetId}`, | |||
endpoint: `/api/v1/dataset/${datasetId}/column`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endpoint: `/api/v1/dataset/${datasetId}/column`, | |
endpoint: `/api/v1/dataset/${datasetId}/columns`, |
Should this end point be /columns
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on columns
as it represents a set of columns.
I added a ticket to Superset 3.0 project board to refactor the endpoints to use the plural form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ktmud and @michael-s-molina I agree that it should be columns
and not column
—it should also be datasets
and not dataset
, however per,
we already use the singular form, hence using column
ensures consistency which I sense is more important. Per @michael-s-molina's comment I think changing to plural forms should be a bulk operation.
rison_data.setdefault("filters", []) | ||
rison_data["filters"].append({"col": "table_id", "opr": "eq", "value": pk}) | ||
rison_data["page_size"] = -1 | ||
return self.get_list_headless(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar, @michael-s-molina, or @villebro I would love your input on this as I'm not overly familiar with the new FAB API and/or how best to define said endpoint. I came across the pattern by prodding around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use something more like: https://github.com/apache/superset/blob/master/superset/annotation_layers/annotations/api.py#L129
why the rison_data["page_size"] = -1
because the frontend is not handling pagination yet?
also we probably need to apply DatasourceFilter
here, or users will be able to view dataset columns from datasets their not authorized to view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar I added the rison_data["page_size"] = -1
as the /api/v1/dataset/{pk}
response is non-paginated and I wanted to ensure consistency. Additionally this endpoint is currently solely used in a non-paginated way, i.e., it's not powering any /*/list/
endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the pattern I see it being only used twice and it only applies one rison override (as opposed to two which is needed in our case). I think there's merit in refactoring if it's used more than once. Furthermore this example is actually a misnomer in terms of naming, i.e., the function name has been copied and pasted as is even though it's applying the report_schedule
as opposed to layer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I Agree let's refactor this after. Still think we need to apply the DatasourceFilter
here, have you checked it?
method_permission_name = MODEL_API_RW_METHOD_PERMISSION_MAP | ||
|
||
list_columns = [ # See DatasetRestApi.show_select_columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not overly sure the difference between list_columns
and show_columns
but I thought that it was prudent that the table_columns
columns returned was consistent with the columns field in the /api/v1/dataset
response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list_columns
is for get_list and show_columns
is for get
item. Not great naming, at the time I aimed for compatibility with the previous MovelView
classes on FAB to allow for a smooth migration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use this opportunity to add comments in the code explaining the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @john-bodley! thank you
rison_data.setdefault("filters", []) | ||
rison_data["filters"].append({"col": "table_id", "opr": "eq", "value": pk}) | ||
rison_data["page_size"] = -1 | ||
return self.get_list_headless(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use something more like: https://github.com/apache/superset/blob/master/superset/annotation_layers/annotations/api.py#L129
why the rison_data["page_size"] = -1
because the frontend is not handling pagination yet?
also we probably need to apply DatasourceFilter
here, or users will be able to view dataset columns from datasets their not authorized to view
@@ -36,14 +38,99 @@ | |||
class DatasetColumnsRestApi(BaseSupersetModelRestApi): | |||
datamodel = SQLAInterface(TableColumn) | |||
|
|||
include_route_methods = {"delete"} | |||
allow_browser_login = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably check and move allow_browser_login = True
into BaseSupersetModelRestApi
since it seems it's used everywhere, I'll make a PR for it
superset/datasets/columns/api.py
Outdated
class_permission_name = "Dataset" | ||
include_route_methods = {RouteMethod.DELETE, RouteMethod.GET, RouteMethod.GET_LIST} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we just need the RouteMethod.GET_LIST
method_permission_name = MODEL_API_RW_METHOD_PERMISSION_MAP | ||
|
||
list_columns = [ # See DatasetRestApi.show_select_columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list_columns
is for get_list and show_columns
is for get
item. Not great naming, at the time I aimed for compatibility with the previous MovelView
classes on FAB to allow for a smooth migration
f1bd7f2
to
6845322
Compare
Kicking CI 🤞 |
Regrettably when examining the initial problem I missed that the I thought it was prudent to return to the drawing board and close this PR in favor of trying to optimize the underlying FAB/SQLAlchemy logic when fetching a model with its associated relationships. @dpgaspar hopefully dpgaspar/Flask-AppBuilder#1959 is a step in the right direction. All going well I'll follow up with another PR which updates the existing |
SUMMARY
Per SIP-64 this PR adds the
/api/v1/dataset/{pk}/column
endpoints for obtaining just the columns associated with a dataset to aid the performance of the native filters.As indicated in the SIP the
/api/v1/dataset/{pk}
endpoint is highly inefficient when fetching a dataset comprising of a large number of columns and/or metrics, i.e., specifically at Airbnb we have a virtual dataset which houses thousands of metrics and dimensions which results in an interim query which computes the cross product of# metrics
x# columns
x# owners
—resulting in millions of rows.Given the native filters only use the columns/metrics independently from the
/api/v1/dataset/{pk}
endpoint this PR simply adds a new endpoint to fetch only the columns metadata associated with said dataset.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Added unit tests. Also tested the new endpoint with a dataset comprising of ~ 7,000 columns and the
/api/v1/dataset/{pk}/column
endpoint returned in ~ 1 second whereas/api/v1/dataset/{pk}
timed out.ADDITIONAL INFORMATION