-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
OPIK-859: Reduce find spans query cost #1123
OPIK-859: Reduce find spans query cost #1123
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically converts the JOIN to a SUBQUERY which seems to be more efficient in ClickHouse, or at least for this particular case.
Left some comments of things to double check or for the future, but we should be good to go.
if(end_time IS NOT NULL AND start_time IS NOT NULL | ||
AND notEquals(start_time, toDateTime64('1970-01-01 00:00:00.000', 9)), | ||
(dateDiff('microsecond', start_time, end_time) / 1000.0), | ||
NULL) AS duration_millis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need this duration_millis
field here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there is a dynamic filter that assumes this field exists in the query. We can probably remove it by changing it to a materialized column instead. This will also make the query simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was no clear about its use case. I know now. No need to change anything at the moment.
SELECT * | ||
FROM feedback_scores | ||
WHERE entity_type = 'span' | ||
AND project_id = :project_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, we can filter by workspace_id
here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree I will push this in a following PR
<if(filters)> AND <filters> <endif> | ||
<if(feedback_scores_filters)> | ||
AND id in ( | ||
WHERE id IN ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's double check if we need a similar optimisation for the related count query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The count follows the same structure as we are using now (subquery returning id and duration_millis), and then count only one ìd. So we should be fine.
Details
Despite the slight differences in the parts and the big difference in granules. I'm confident this new query is more efficient since it postpones the retrieval of fields that are not part of the sortable key. This has sped up the process considerably.
Before:
After:
Issues
OPIK-859