Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

NorthwindSplitIncludeQuery test depend on specific implicit ordering and are hard to fix #26715

Open
Tracked by #30173
lauxjpn opened this issue Nov 16, 2021 · 0 comments
Assignees
Labels
area-groupby area-query area-test customer-reported punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-bug
Milestone

Comments

@lauxjpn
Copy link
Contributor

lauxjpn commented Nov 16, 2021

@lauxjpn can you provide more details so we can see if this is something we need to look into?

@roji The issues can be found in the NorthwindSplitIncludeQueryMySqlTest.cs file.

Its test cases will run fine on MySQL 8 without having to override anything from the base class, because MySQL 8 implicitly sorts/returns the OrderDetails entities as SQL Server (and Postgres) does.

But MariaDB does not.


Let's look at the Include_collection_SelectMany_GroupBy_Select test case for example.

LINQ query
(from o in ss.Set<Order>().Include(o => o.OrderDetails).Where(o => o.OrderID == 10248)
    from od in ss.Set<OrderDetail>()
    select o)
.GroupBy(e => e.OrderID)
.Select(e => e.OrderBy(o => o.OrderID).FirstOrDefault()
Generated SQL (SQL Server)
SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t].[OrderID], [t0].[OrderID0], [t0].[ProductID]
FROM (
    SELECT [o].[OrderID]
    FROM [Orders] AS [o]
    CROSS JOIN [Order Details] AS [o0]
    WHERE [o].[OrderID] = 10248
    GROUP BY [o].[OrderID]
) AS [t]
LEFT JOIN (
    SELECT [t1].[OrderID], [t1].[CustomerID], [t1].[EmployeeID], [t1].[OrderDate], [t1].[OrderID0], [t1].[ProductID]
    FROM (
        SELECT [o1].[OrderID], [o1].[CustomerID], [o1].[EmployeeID], [o1].[OrderDate], [o2].[OrderID] AS [OrderID0], [o2].[ProductID], ROW_NUMBER() OVER(PARTITION BY [o1].[OrderID] ORDER BY [o1].[OrderID]) AS [row]
        FROM [Orders] AS [o1]
        CROSS JOIN [Order Details] AS [o2]
        WHERE [o1].[OrderID] = 10248
    ) AS [t1]
    WHERE [t1].[row] <= 1
) AS [t0] ON [t].[OrderID] = [t0].[OrderID]
ORDER BY [t].[OrderID], [t0].[OrderID], [t0].[OrderID0], [t0].[ProductID];

SELECT [o3].[OrderID], [o3].[ProductID], [o3].[Discount], [o3].[Quantity], [o3].[UnitPrice], [t].[OrderID], [t0].[OrderID], [t0].[OrderID0], [t0].[ProductID]
FROM (
    SELECT [o].[OrderID]
    FROM [Orders] AS [o]
    CROSS JOIN [Order Details] AS [o0]
    WHERE [o].[OrderID] = 10248
    GROUP BY [o].[OrderID]
) AS [t]
LEFT JOIN (
    SELECT [t1].[OrderID], [t1].[OrderID0], [t1].[ProductID]
    FROM (
        SELECT [o1].[OrderID], [o2].[OrderID] AS [OrderID0], [o2].[ProductID], ROW_NUMBER() OVER(PARTITION BY [o1].[OrderID] ORDER BY [o1].[OrderID]) AS [row]
        FROM [Orders] AS [o1]
        CROSS JOIN [Order Details] AS [o2]
        WHERE [o1].[OrderID] = 10248
    ) AS [t1]
    WHERE [t1].[row] <= 1
) AS [t0] ON [t].[OrderID] = [t0].[OrderID]
INNER JOIN [Order Details] AS [o3] ON [t0].[OrderID] = [o3].[OrderID]
ORDER BY [t].[OrderID], [t0].[OrderID], [t0].[OrderID0], [t0].[ProductID];
Generated SQL (MySQL/MariaDB)
SELECT `t0`.`OrderID`, `t0`.`CustomerID`, `t0`.`EmployeeID`, `t0`.`OrderDate`, `t`.`OrderID`, `t0`.`OrderID0`, `t0`.`ProductID`
FROM (
    SELECT `o`.`OrderID`
    FROM `Orders` AS `o`
    CROSS JOIN `Order Details` AS `o0`
    WHERE `o`.`OrderID` = 10248
    GROUP BY `o`.`OrderID`
) AS `t`
LEFT JOIN (
    SELECT `t1`.`OrderID`, `t1`.`CustomerID`, `t1`.`EmployeeID`, `t1`.`OrderDate`, `t1`.`OrderID0`, `t1`.`ProductID`
    FROM (
        SELECT `o1`.`OrderID`, `o1`.`CustomerID`, `o1`.`EmployeeID`, `o1`.`OrderDate`, `o2`.`OrderID` AS `OrderID0`, `o2`.`ProductID`, ROW_NUMBER() OVER(PARTITION BY `o1`.`OrderID` ORDER BY `o1`.`OrderID`) AS `row`
        FROM `Orders` AS `o1`
        CROSS JOIN `Order Details` AS `o2`
        WHERE `o1`.`OrderID` = 10248
    ) AS `t1`
    WHERE `t1`.`row` <= 1
) AS `t0` ON `t`.`OrderID` = `t0`.`OrderID`
ORDER BY `t`.`OrderID`, `t0`.`OrderID`, `t0`.`OrderID0`, `t0`.`ProductID`;

SELECT `o3`.`OrderID`, `o3`.`ProductID`, `o3`.`Discount`, `o3`.`Quantity`, `o3`.`UnitPrice`, `t`.`OrderID`, `t0`.`OrderID`, `t0`.`OrderID0`, `t0`.`ProductID`
FROM (
    SELECT `o`.`OrderID`
    FROM `Orders` AS `o`
    CROSS JOIN `Order Details` AS `o0`
    WHERE `o`.`OrderID` = 10248
    GROUP BY `o`.`OrderID`
) AS `t`
LEFT JOIN (
    SELECT `t1`.`OrderID`, `t1`.`OrderID0`, `t1`.`ProductID`
    FROM (
        SELECT `o1`.`OrderID`, `o2`.`OrderID` AS `OrderID0`, `o2`.`ProductID`, ROW_NUMBER() OVER(PARTITION BY `o1`.`OrderID` ORDER BY `o1`.`OrderID`) AS `row`
        FROM `Orders` AS `o1`
        CROSS JOIN `Order Details` AS `o2`
        WHERE `o1`.`OrderID` = 10248
    ) AS `t1`
    WHERE `t1`.`row` <= 1
) AS `t0` ON `t`.`OrderID` = `t0`.`OrderID`
INNER JOIN `Order Details` AS `o3` ON `t0`.`OrderID` = `o3`.`OrderID`
ORDER BY `t`.`OrderID`, `t0`.`OrderID`, `t0`.`OrderID0`, `t0`.`ProductID`;

The generated SQL queries for MSSQL and MySQL (and MariaDB) are the same.

Let's only look at the first one of the two generated SQL queries, and its most inner subquery:

SELECT `o1`.`OrderID`, `o1`.`CustomerID`, `o1`.`EmployeeID`, `o1`.`OrderDate`, `o2`.`OrderID` AS `OrderID0`, `o2`.`ProductID`, ROW_NUMBER() OVER(PARTITION BY `o1`.`OrderID` ORDER BY `o1`.`OrderID`) AS `row`
FROM `Orders` AS `o1`
CROSS JOIN `Order Details` AS `o2`
WHERE `o1`.`OrderID` = 10248

The LINQ query uses .Select(e => e.OrderBy(o => o.OrderID).FirstOrDefault()).

This is being translated into a ROW_NUMBER() OVER(PARTITION BY `o1`.`OrderID` ORDER BY `o1`.`OrderID`) AS `row` clause, so that the FirstOrDefault() part of the LINQ query can later use the row column to figure out what the first record (with the WHERE `t1`.`row` <= 1 clause).

The issue is, that ORDER BY `o1`.`OrderID` is not ordered by enough columns to be deterministic. Let's take a look at the first row returned for this inner most subquery for MySQL and MariaDB respectively:

MySQL

OrderID CustomerID EmployeeID OrderDate OrderID0 ProductID row
10248 VINET 5 1996-07-04 00:00:00 10248 11 1

MariaDB

OrderID CustomerID EmployeeID OrderDate OrderID0 ProductID row
10248 VINET 5 1996-07-04 00:00:00 10799 24 1

MariaDB is highly optimized and if you don't order something explicitly, you might not just get an unordered result, but it could even be in a non-deterministic order (over multiple runs).

To make the query deterministic, it actually needs to be ROW_NUMBER() OVER(PARTITION BY `o1`.`OrderID` ORDER BY `o1`.`OrderID`, `o2`.`OrderID`, `o2`.`ProductID`) AS `row`.

And ordering the Orders.OrderDetails collection in the LINQ query is tricky.

Originally posted by @lauxjpn in PomeloFoundation/Pomelo.EntityFrameworkCore.MySql#1553 (comment)

Include provider and version information

EF Core version: 6.0.0

@ajcvickers ajcvickers added this to the 7.0.0 milestone Nov 16, 2021
@ajcvickers ajcvickers modified the milestones: 7.0.0, MQ Aug 29, 2022
@ajcvickers ajcvickers added punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. and removed propose-punt labels Aug 29, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area-groupby area-query area-test customer-reported punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-bug
Projects
None yet
Development

No branches or pull requests

4 participants