Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Discussion] Evaluating Query Performance: Z-order Indexing and Full Table Scans in Delta Lake with Trino #24724

Open
RohanDudeja opened this issue Jan 16, 2025 · 0 comments

Comments

@RohanDudeja
Copy link

Issue Description:

We are experiencing full table scans in a query that operates on a Delta Lake dataset using a Trino connection. The query involves multiple joins and filters on partitioned and Z-order indexed columns.

Despite the optimizations in place:

  1. The created_date column is partitioned.
  2. A Z-order index has been applied to the columns used in the join (b_id, c_id) and filter (entity_id, created_date).

The query still results in full table scans, regardless of the created_date range specified (e.g., 1 day, 1 week, or 2 weeks). This raises concerns about the effectiveness of Z-order indexing in our setup and the overall query performance.


Sample Query:

SELECT
  table_a.id,
  table_a.b_id,
  table_a.c_id,
  table_a.created_date,
  table_b.col1,
  table_b.col2,
  table_c.col1,
  table_c.col2
FROM
  database.table_a AS table_a
  LEFT JOIN database.table_b AS table_b 
    ON table_b.entity_id = 'x' AND table_a.b_id = table_b.id
  LEFT JOIN database.table_c AS table_c 
    ON table_c.entity_id = 'x' AND table_a.c_id = table_c.id
WHERE
  table_a.entity_id IN ('x') 
  AND table_a.created_date BETWEEN '2025-01-01' AND '2025-01-02'
ORDER BY
  table_a.id ASC;

Problem Statement:

We need to determine whether Z-order indexing is providing any performance benefits and identify alternative approaches to optimize the query execution plan and avoid full table scans.


Questions for Discussion:

  1. Z-order Indexing Impact:

    • Does Z-order indexing improve query performance in this specific use case with Delta Lake and Trino?
    • Are there scenarios where Z-order indexing might not provide noticeable benefits?
  2. Query Optimization:

    • Are there alternative approaches to optimize the query execution plan to prevent full table scans?
  3. Trino-Specific Behaviors:

    • Does Trino’s query execution engine fully leverage Z-order indexing for data pruning in Delta Lake?
    • Are there specific configurations or best practices for Trino to better utilize Z-order indexes?
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

No branches or pull requests

1 participant