-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Working query in EFCore 3.+ doesn't produce the same SQL query in EFCore 6/7+ #30938
Comments
/cc @roji |
@FHTSean how exactly is the second query wrong, they seem like equivalent SQLs that should return the same results? Are you seeing different rows returned by the two SQLs? If so, can you share a minimal, runnable repro or at least the kind of data showing the difference? Note #30955 which I opened (before noticing this!), which would change our translation back to using IN instead of EXISTS in most cases - but AFAIK that's a SQL optimization (and possibly performance) change, rather than a correctness change. |
The first query and second query produces vastly different results. The second query scanned about 200 million rows and produces 45 thousand customers from 2000 distinct customers. |
Can you provide a minimal sample showing the different results? |
All i've done is create a table with two columns - CustomerMainTableId and CustomerMainTableCustomerId. I've used a loop to insert customer ids 1->2000 4 times, producing 8 thousand records. There are 2000 distinct customer ids - and the screenshot shows that the 2nd query with the "Exists" statement examines the row 56 million times. |
Can you please share this (e.g. a SQL script to create the table with the dummy data), plus a minimal code sample, so that I can reproduce the same results here?
The performance of the query is one question - it's indeed possible that one variant is faster than the other. But before that I'm concerned with its correctness, since you indicated above that you're seeing different results coming. This is what I'd like to understand better. |
Hi Roji, Happy to move on to the performance bit. |
Great, thanks for confirming about the correctness. Re the performance - this comes at exactly the right time: I'm currently working on #30955 which should fix this. It would still be helpful for me to get a clear repro from you to confirm the perf improvement, i.e. a dummy database create script + two SQLs which perform differently. |
Hi Roji, The two different SQLs were already provided above as: SELECT *
FROM `CustomerMainTable` AS `t`
WHERE `t`.`CustomerMainTableId` IN (
SELECT MAX(`t0`.`CustomerMainTableId`)
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`
) AND SELECT *
FROM `CustomerMainTable` AS `t`
WHERE EXISTS (
SELECT 1
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`
HAVING MAX(`t0`.`CustomerMainTableId`) = `t`.`CustomerMainTableId`) Are you asking for something different? |
@FHTSean thanks, I didn't notice the dummy database. I could indeed reproduce the dramatic performance differences on MariaDB - and also less dramatic (but still very significant) differences on other databases. I've posted the full findings in #30955 (comment), we'll definitely be fixing this for 8.0. Thanks again for raising this - it's always great to get clear, targeted issues with full repros as you've provided. I'll go ahead and close this as a duplicate of #30955. |
Duplicate of #30955 |
Hello,
The following is the SQL query i am trying to produce in LINQ:
CustomerMainTableId is an autoincrement id, and CustomerMainTableCustomerId is the customer's unique ID. There can be multiple entries of the customer's transaction in the CustomerMainTableId, so the idea is to select the latest transaction for each customer by grouping and selecting the max id.
In entity framework core 3.1, this was achieved via:
However, the same query, when executed in EF Core 6 or 7 produces a wrong query in SQL:
Interestingly enough, if we split the Linq query like so:
And then feed the list of integers back to the contains query:
It produces the correct query. However, feeding the list of integers back to the contains query means this is a client side evaluation which is not ideal.
The text was updated successfully, but these errors were encountered: