Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Different invariants optimization between views and CTEs #7388

Closed
chrpos opened this issue Nov 14, 2022 · 5 comments
Closed

Different invariants optimization between views and CTEs #7388

chrpos opened this issue Nov 14, 2022 · 5 comments

Comments

@chrpos
Copy link

chrpos commented Nov 14, 2022

Hi,

I am using Firebird Version: 4.0.2 (issue is there also with Firebird 3.0.7)
Due to performance issues I was facing with an sql query on a database, I generated a simple reproducible example, based on the "employee" sample database of firebird.

The issue in short: The query in question uses a view. When I replace the view, that is referenced in the query, with a common table expression (CTE), using the same SQL for the CTE as for the view, the query executes in fact very differently. See the comparison of the performance analysis. Interestingly enough, the execution plans of both queries are the same, though execute differently.

In the following example, Query A references the view "EMP_DEPT_TEAM". Query B uses the content of the view's query as CTE. Query A and B produce the same execution plans. I would expect query A and B performing just the same, but query A performs less good than query B.

How to reproduce (use the database "employee.fdb"):
Create a view:

create view EMP_DEPT_TEAM
as
select
    EMP_NO,
    DEPARTMENT.DEPT_NO,
    iif(PROJECT.TEAM_LEADER is not null, 'yes', 'no') as IS_TEAMLEADER
from EMPLOYEE
    inner join DEPARTMENT on EMPLOYEE.DEPT_NO = DEPARTMENT.DEPT_NO
    left join PROJECT on EMPLOYEE.EMP_NO = PROJECT.TEAM_LEADER;

Query A (using the view):

select
    EMP_NO,
    DEPT_NO,
    IS_TEAMLEADER
from EMP_DEPT_TEAM
    where EMP_DEPT_TEAM.EMP_NO = (select max(EMP_NO) from EMP_DEPT_TEAM);

Query B (not using the view, but a CTE):

with EMP_DEPT_TEAM_ as (select
    EMP_NO,
    DEPARTMENT.DEPT_NO,
    iif(PROJECT.TEAM_LEADER is not null, 'yes', 'no') as IS_TEAMLEADER
from EMPLOYEE
    inner join DEPARTMENT on EMPLOYEE.DEPT_NO = DEPARTMENT.DEPT_NO
    left join PROJECT on EMPLOYEE.EMP_NO = PROJECT.TEAM_LEADER
)
select
    EMP_NO,
    DEPT_NO,
    IS_TEAMLEADER
from EMP_DEPT_TEAM_
    where EMP_NO = (select max(EMP_NO) from EMP_DEPT_TEAM_);

Execution plan for Query A (see also attached screenshots):

PLAN JOIN (JOIN (EMP_DEPT_TEAM DEPARTMENT NATURAL, EMP_DEPT_TEAM EMPLOYEE INDEX (RDB$FOREIGN8)), EMP_DEPT_TEAM PROJECT INDEX (RDB$FOREIGN13))
PLAN JOIN (JOIN (EMP_DEPT_TEAM EMPLOYEE INDEX (RDB$PRIMARY7), EMP_DEPT_TEAM DEPARTMENT INDEX (RDB$PRIMARY5)), EMP_DEPT_TEAM PROJECT INDEX (RDB$FOREIGN13))

Details:
Execution-Plan-Query-A-View

Execution plan for Query B (see also attached screenshots):

PLAN JOIN (JOIN (EMP_DEPT_TEAM_ DEPARTMENT NATURAL, EMP_DEPT_TEAM_ EMPLOYEE INDEX (RDB$FOREIGN8)), EMP_DEPT_TEAM_ PROJECT INDEX (RDB$FOREIGN13))
PLAN JOIN (JOIN (EMP_DEPT_TEAM_ EMPLOYEE INDEX (RDB$PRIMARY7), EMP_DEPT_TEAM_ DEPARTMENT INDEX (RDB$PRIMARY5)), EMP_DEPT_TEAM_ PROJECT INDEX (RDB$FOREIGN13))

Details:
Execution-Plan-Query-B-CTE

Performance Analysis:
Compare number of reads!

Query A (View):
Performance-Query-A-View

Query B (CTE):
Performance-Query-B-CTE

@dyemanov
Copy link
Member

Does the performance still differ after replacing (select max(EMP_NO) from EMP_DEPT_TEAM_) with a corresponding constant?

@chrpos
Copy link
Author

chrpos commented Nov 14, 2022

Does the performance still differ after replacing (select max(EMP_NO) from EMP_DEPT_TEAM_) with a corresponding constant?

No, the performance does not differ, when I replace (select max() ...) with a constant.

@dyemanov
Copy link
Member

So it appears that the subquery is treated as invariant (and evaluated only once) in one case and as variant in another case, thus being re-evaluated per every outer row and thus badly affecting the performance.

@livius2
Copy link

livius2 commented Nov 15, 2022

If i look at it correctly we have discussed this on the support list, so i link it here for the reference only, maybe it is somehow helpful.
https://groups.google.com/g/firebird-support/c/J0ksNKAzDsA/m/h-_24mZ1AQAJ?utm_medium=email&utm_source=footer

@dyemanov
Copy link
Member

Test case based on system tables only:

create view v1
as select r.rdb$relation_id as id, r.rdb$relation_name as name
from rdb$relations r
  inner join rdb$relation_fields rf on r.rdb$relation_name = rf.rdb$relation_name
  left join rdb$security_classes sc on r.rdb$security_class = sc.rdb$security_class
;

select *
from v1
where id = (select max(id) from v1)
;

-- Fetches = 11801

with sub as (
select r.rdb$relation_id as id, r.rdb$relation_name as name
from rdb$relations r
  inner join rdb$relation_fields rf on r.rdb$relation_name = rf.rdb$relation_name
  left join rdb$security_classes sc on r.rdb$security_class = sc.rdb$security_class
)
select * from sub
where sub.id = (select max(id) from sub)
;

-- Fetches = 2401

@dyemanov dyemanov self-assigned this Nov 16, 2022
dyemanov added a commit that referenced this issue Nov 16, 2022
dyemanov added a commit that referenced this issue Dec 16, 2022
Fix invariants optimization involving views (#7388)
@dyemanov dyemanov changed the title Non-equal execution when using view or CTE Different invariants optimization between views and CTEs Jan 11, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment