-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Translate Contains to IN with subquery instead of EXISTS where relevant #30955
Comments
Using IN rather than EXISTS provides some very serious performance advantages, on all databases except for SQL Server; this is probably due to the removal of the correlated subquery which EXISTS requires. The benchmark below uses the data and queries from @FHTSean, comparing the following two SQLs (see #30938): SELECT *
FROM `CustomerMainTable` AS `t`
WHERE `t`.`CustomerMainTableId` IN (
SELECT MAX(`t0`.`CustomerMainTableId`)
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`
) ... with: SELECT *
FROM `CustomerMainTable` AS `t`
WHERE EXISTS (
SELECT 1
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`
HAVING MAX(`t0`.`CustomerMainTableId`) = `t`.`CustomerMainTableId`) On MariaDB:
(That's 7ms compared to 9 seconds) On PostgreSQL I'm also seeing a very big performance difference, even if less dramatic:
Also on SQLite:
On SQL Server both queries have the exact same running time, I'm guessing SQL Server sees through the EXISTS and avoids the correlated subquery. Full benchmark sourcesBenchmarkRunner.Run<Benchmark>();
public class Benchmark
{
private MySqlConnection _connection;
[GlobalSetup]
public async Task Setup()
{
_connection = new MySqlConnection("Server=localhost;Database=test;User ID=root;Password=Abcd5678;");
await _connection.OpenAsync();
}
[Benchmark]
public async Task In()
{
await using var command = new MySqlCommand(
"""
SELECT *
FROM `CustomerMainTable` AS `t`
WHERE `t`.`CustomerMainTableId` IN (
SELECT MAX(`t0`.`CustomerMainTableId`)
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`);
""", _connection);
await command.ExecuteNonQueryAsync();
}
[Benchmark]
public async Task Exists()
{
await using var command = new MySqlCommand(
"""
SELECT *
FROM `CustomerMainTable` AS `t`
WHERE EXISTS (
SELECT 1
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`
HAVING MAX(`t0`.`CustomerMainTableId`) = `t`.`CustomerMainTableId`)
""", _connection);
await command.ExecuteNonQueryAsync();
}
} |
Although our InExpression supports having a subquery (and various visitors handle that), we never actually generate this - instead we always generate an EXISTS subquery with a predicate (based on SQL Server baselines). Translating to InExpression is likely better - it doesn't require a correlated subquery and expresses the intent more succintly, without needing a predicate. For example, instead of this:
We can translate to this:
However, note that our null semantics around InExpression with subquery currently seems broken (fortunately it's dead code): IN returns null when the item is null (or when the values contains null and a match isn't found). We'd need to make that logic actually work.
The text was updated successfully, but these errors were encountered: