OPIK-859: Reduce find spans query cost #1123

thiagohora · 2025-01-23T13:41:54Z

Details

Despite the slight differences in the parts and the big difference in granules. I'm confident this new query is more efficient since it postpones the retrieval of fields that are not part of the sortable key. This has sped up the process considerably.

Before:

Expression (Project names)
  Limit
    LimitBy
      Expression ((Before LIMIT BY + (Before ORDER BY + Projection) [lifted up part]))
        Sorting (Sorting for ORDER BY)
          Expression ((Before ORDER BY + Projection))
            Expression
              ReadFromMergeTree (opik_prod.spans)
              Indexes:
                PrimaryKey
                  Keys: 
                    workspace_id
                    project_id
                  Condition: and((workspace_id in ['48d64607-f70d-4cb1-9207-dd658a423f8e', '48d64607-f70d-4cb1-9207-dd658a423f8e']), (project_id in ['019484c0-1a3c-7987-b2d4-0e3598e8717a', '019484c0-1a3c-7987-b2d4-0e3598e8717a']))
                  Parts: 16/20
                  Granules: 828/29114

After:

CreatingSets (Create sets before main query execution)
  Expression ((Project names + (Before ORDER BY + Projection) [lifted up part]))
    Sorting (Sorting for ORDER BY)
      Expression ((Before ORDER BY + Projection))
        Expression
          ReadFromMergeTree (opik_prod.spans)
          Indexes:
            PrimaryKey
              Keys: 
                id
              Condition: (id in 200-element set)
              Parts: 14/14
              Granules: 25196/29116

Issues

OPIK-859

andrescrz

This basically converts the JOIN to a SUBQUERY which seems to be more efficient in ClickHouse, or at least for this particular case.

Left some comments of things to double check or for the future, but we should be good to go.

andrescrz · 2025-01-23T14:58:25Z

apps/opik-backend/src/main/java/com/comet/opik/domain/SpanDAO.java

+                        if(end_time IS NOT NULL AND start_time IS NOT NULL
+                                 AND notEquals(start_time, toDateTime64('1970-01-01 00:00:00.000', 9)),
+                             (dateDiff('microsecond', start_time, end_time) / 1000.0),
+                             NULL) AS duration_millis


Do you still need this duration_millis field here?

Because there is a dynamic filter that assumes this field exists in the query. We can probably remove it by changing it to a materialized column instead. This will also make the query simpler

Ok, I was no clear about its use case. I know now. No need to change anything at the moment.

andrescrz · 2025-01-23T14:59:40Z

apps/opik-backend/src/main/java/com/comet/opik/domain/SpanDAO.java

+                            SELECT *
+                            FROM feedback_scores
+                            WHERE entity_type = 'span'
+                            AND project_id = :project_id


In the future, we can filter by workspace_id here as well.

Agree I will push this in a following PR

andrescrz · 2025-01-23T14:59:59Z

apps/opik-backend/src/main/java/com/comet/opik/domain/SpanDAO.java

-             <if(filters)> AND <filters> <endif>
-             <if(feedback_scores_filters)>
-             AND id in (
+             WHERE id IN (


Let's double check if we need a similar optimisation for the related count query.

The count follows the same structure as we are using now (subquery returning id and duration_millis), and then count only one ìd. So we should be fine.

OPIK-859: Reduce find spans query cost

8e2e6e1

thiagohora requested a review from a team as a code owner January 23, 2025 13:41

andrescrz approved these changes Jan 23, 2025

View reviewed changes

thiagohora merged commit 34062db into main Jan 23, 2025
8 checks passed

thiagohora deleted the thiagohora/OPIK-859_reduce_find_spans_query_cost branch January 23, 2025 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPIK-859: Reduce find spans query cost #1123

OPIK-859: Reduce find spans query cost #1123

thiagohora commented Jan 23, 2025 •

edited

Loading

andrescrz left a comment

andrescrz Jan 23, 2025

thiagohora Jan 23, 2025

andrescrz Jan 23, 2025

andrescrz Jan 23, 2025

thiagohora Jan 23, 2025

andrescrz Jan 23, 2025

thiagohora Jan 23, 2025

OPIK-859: Reduce find spans query cost #1123

OPIK-859: Reduce find spans query cost #1123

Conversation

thiagohora commented Jan 23, 2025 • edited Loading

Details

Issues

andrescrz left a comment

Choose a reason for hiding this comment

andrescrz Jan 23, 2025

Choose a reason for hiding this comment

thiagohora Jan 23, 2025

Choose a reason for hiding this comment

andrescrz Jan 23, 2025

Choose a reason for hiding this comment

andrescrz Jan 23, 2025

Choose a reason for hiding this comment

thiagohora Jan 23, 2025

Choose a reason for hiding this comment

andrescrz Jan 23, 2025

Choose a reason for hiding this comment

thiagohora Jan 23, 2025

Choose a reason for hiding this comment

thiagohora commented Jan 23, 2025 •

edited

Loading