Skip to content

TPT: Avoid connecting with derived tables when not selecting any properties from them #21773

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
smitpatel opened this issue Jul 24, 2020 · 7 comments · Fixed by #21992
Closed

Comments

@smitpatel
Copy link
Contributor

May be a lazy injection of columns. Also can be useful for owned types sharing tables.

Child of #18923

@smitpatel
Copy link
Contributor Author

@dotnet/efcore - Can someone post example of pri0 query which was problematic in EF6 and we wanted to improve.

@bricelam
Copy link
Contributor

bricelam commented Aug 3, 2020

Generated SQL Improvements for TPT Queries (June CTP) - ADO.NET Blog

@bricelam
Copy link
Contributor

bricelam commented Aug 3, 2020

DevDiv work items 7221, 137333 (I don't remember how to access these)
CodePlex issues 1218, 2296, 2814
dotnet/ef6@a1af241

@smitpatel
Copy link
Contributor Author

From 2814
I have three classes:

public class EntityBase
{
public Guid Id { get; set; }

[MaxLength(50)]
public string Name { get; set; }

public List<Tag> Tags { get; set; }
}

[Table("DerivedEntities1")]
public class DerivedEntity1 : EntityBase
{
[MaxLength(50)]
public string Text { get; set; }
}

[Table("DerviedEntities2")]
public class DerivedEntity2 : EntityBase
{
public int Score { get; set; }
}

If I query for base entities and project only plain properties (Name, for example) - everything is fine, derived classes are not joined to the query, since no information is needed from them

var entities = dataContext.Entities
.Select(e => new { Name = e.Name })
.ToList();
SELECT
1 AS [C1],
[Extent1].[Name] AS [Name]
FROM [dbo].[EntityHeaders] AS [Extent1]

However, if I need to fetch a collection for the base class, tables for all derived classes appear in the query.

var entities = dataContext.Entities
.Include(e => e.Tags)
.Select(e => new { Name = e.Name, Tags = e.Tags })
.ToList();
SELECT
[Project1].[Id1] AS [Id],
[Project1].[Id2] AS [Id1],
[Project1].[Id] AS [Id2],
[Project1].[C1] AS [C1],
[Project1].[Name] AS [Name],
[Project1].[C2] AS [C2],
[Project1].[Id3] AS [Id3],
[Project1].[Value] AS [Value],
[Project1].[EntityBase_Id] AS [EntityBase_Id]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent3].[Id] AS [Id2],
1 AS [C1],
[Extent4].[Id] AS [Id3],
[Extent4].[Value] AS [Value],
[Extent4].[EntityBase_Id] AS [EntityBase_Id],
CASE WHEN ([Extent4].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [dbo].[EntityHeaders] AS [Extent1]
LEFT OUTER JOIN [dbo].[DerivedEntities2] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Id]
LEFT OUTER JOIN [dbo].[DerivedEntities1] AS [Extent3] ON [Extent1].[Id] = [Extent3].[Id]
LEFT OUTER JOIN [dbo].[Tags] AS [Extent4] ON [Extent1].[Id] = [Extent4].[EntityBase_Id]
) AS [Project1]
ORDER BY [Project1].[Id1] ASC, [Project1].[Id2] ASC, [Project1].[Id] ASC, [Project1].[C2] ASC

Is there any possibility to avoid this?

@smitpatel
Copy link
Contributor Author

smitpatel commented Aug 3, 2020

Query generation bug. Not directly related to TPT

2296
The following model/query:

public class MyBase
{
public int Id { get; set; }
}

public class MyDerived1 : MyBase
{
public string Name { get; set; }
public MyDerived2 Der2 { get; set; }
}

public class MyDerived2 : MyBase
{
public string Name { get; set; }
public MyDerived1 Der1 { get; set; }
}

public class MyContext : DbContext
{
public DbSet<MyBase> Bases { get; set; }

protected override void OnModelCreating(DbModelBuilder builder)
{
builder.Entity<MyBase>().ToTable("MyBase");
builder.Entity<MyDerived2>().HasOptional(e => e.Der1).WithOptionalDependent(e => e.Der2);
builder.Entity<MyDerived1>().ToTable("Der1");
builder.Entity<MyDerived2>().ToTable("Der2");
}
}

ctx.Bases.OfType<MyDerived2>().Where(e => e.Der1.Der2.Name == "Foo");

In 6.1 the following SQL is produced:

SELECT
[Extent1].[Id] AS [Id],
'0X0X' AS [C1],
[Extent1].[Name] AS [Name],
CAST(NULL AS varchar(1)) AS [C2],
[Extent1].[Der1_Id] AS [Der1_Id]
FROM [dbo].[Der2] AS [Extent1]
INNER JOIN [dbo].[Der2] AS [Extent2] ON (([Extent2].[Der1_Id] = [Extent1].[Der1_Id]) OR (([Extent2].[Der1_Id] IS NULL) AND ([Extent1].[Der1_Id] IS NULL))) AND ([Extent1].[Der1_Id] IS NOT NULL)
WHERE ([Extent2].[Der1_Id] IS NOT NULL) AND (N'Foo' = [Extent2].[Name])

However in 6.1.1 query is much more complex:

SELECT
[Project1].[Id] AS [Id],
[Project1].[C1] AS [C1],
[Project1].[Name] AS [Name],
[Project1].[C2] AS [C2],
[Project1].[Der1_Id] AS [Der1_Id]
FROM ( SELECT
    [Extent1].[Id] AS [Id],
    [Extent1].[Der1_Id] AS [Der1_Id],
    [Extent1].[Name] AS [Name],
    '0X0X' AS [C1],
    CAST(NULL AS varchar(1)) AS [C2],
    [Join2].[Der1_Id1] AS [Der1_Id1],
    [Join2].[Name1] AS [Name1],
    [Join2].[Id1] AS [Id1],
    [Join2].[Id2] AS [Id2],
    [Join2].[Der1_Id2] AS [Der1_Id2]
    FROM [dbo].[Der2] AS [Extent1]
    LEFT OUTER JOIN (SELECT [Extent2].[Der1_Id] AS [Der1_Id1], [Extent2].[Name] AS [Name1], [Join1].[Id1], [Join1].[Id2], [Join1].[Der1_Id2]
        FROM [dbo].[Der2] AS [Extent2]
        INNER JOIN (SELECT [Extent3].[Id] AS [Id1], [Extent4].[Id] AS [Id2], [Extent4].[Der1_Id] AS [Der1_Id2]
            FROM [dbo].[Der1] AS [Extent3]
            INNER JOIN [dbo].[Der2] AS [Extent4] ON ([Extent4].[Der1_Id] IS NOT NULL) AND ([Extent3].[Id] = [Extent4].[Der1_Id]) ) AS [Join1] ON [Join1].[Id1] = [Extent2].[Der1_Id] ) AS [Join2] ON [Extent1].[Id] = [Join2].[Id2]
    ) AS [Project1]
WHERE N'Foo' = [Project1].[Name1]

@smitpatel
Copy link
Contributor Author

1218
Suppose the following Type hierarchy (implemented via TPT):

public abstract class Base
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int Id { get; set; }

public string Name { get; set; }
}

[Table("Derived1")]
public class Derived1 : Base
{
public string SomeProperty { get; set; }
}

[Table("Derived2")]
class Derived2 : Base
{
public string OtherProperty { get; set; }
}

If a query is issued that only queries properties in the base:

Bases.Select(b => new {b.Id, b.Name})

The sql generated still performs a join with the union all of the derived type tables:

SELECT
[UnionAll1].[Id] AS [C1],
[Extent3].[Name] AS [Name]
FROM (SELECT
[Extent1].[Id] AS [Id]
FROM [dbo].[Derived2] AS [Extent1]
UNION ALL
SELECT
[Extent2].[Id] AS [Id]
FROM [dbo].[Derived1] AS [Extent2]) AS [UnionAll1]
INNER JOIN [dbo].[Bases] AS [Extent3] ON [UnionAll1].[Id] = [Extent3].[Id]

@smitpatel
Copy link
Contributor Author

smitpatel commented Aug 3, 2020

Not relevant to TPT

2256
In all my cases the 3rd includes property produce LEFT OUTER JOIN sql statement instead of INNER JOIN
ex : this query :

var result = context.ChantierInfos
.Include(x => x.Agence.Region.Siege)
.ToList();
return result;

statement :
SELECT ....
FROM [dbo].[CHANTIER] AS [Extent1]
INNER JOIN [dbo].[AGENCE] AS [Extent2] ON [Extent1].[ID_AGENCE] = [Extent2].[AGENCE_ID]
INNER JOIN [dbo].[REGION] AS [Extent3] ON [Extent2].[ID_REGION] = [Extent3].[REGION_ID]
LEFT OUTER JOIN [dbo].[SIEGE] AS [Extent4] ON [Extent3].[ID_SIEGE] = [Extent4].[SIEGE_ID]

this query on the same join generates an INNER JOIN : why ? (only two relations , if i set three , i will get a LEFT OUTER JOIN)

all the relationships are n to 1

using (var context = new DataContext())
{

var result = context.AgenceInfos
.Include(x => x.Region.Siege)
.ToList();
return result;
}
SELECT ...
FROM [dbo].[AGENCE] AS [Extent1]
INNER JOIN [dbo].[REGION] AS [Extent2] ON [Extent1].[ID_REGION] = [Extent2].[REGION_ID]
INNER JOIN [dbo].[SIEGE] AS [Extent3] ON [Extent2].[ID_SIEGE] = [Extent3].[SIEGE_ID]

Thanks in advance for your feedback

Christophe
Test_CDE_EF6.zip
Hello mikesnet,

It is hard to tell without looking at the model, but for relationships that are actually n:0..1, e.g. where the foreign key is nullable and therefore the relationship is optional we produce LEFT OUTER JOINs and that is by design. In other words, when you use the Include method in a query, it means that you want to include any existing entities that can be reached by traversing the provided path of navigation properties, and it does not imply that you want to filter the results of the query to include only the entities that contain other entities through the path.

I will just make sure that the behavior with non-optional relationships is what we expect before I close. Feel free to provide a repro if you think that is the case.

Thanks,
Diego
2014-05-07
Hello Diego,

I checked as well the mapping and the Db foreign key, and i realy have a many to one relationchip (fk not null , property .IsRequired() in the mapping looks like :
Property(e => e.IdSiege).HasColumnName("ID_SIEGE").IsRequired();
HasRequired(e => e.Siege)
.WithMany(m => m.RegionList)
.HasForeignKey(fk => fk.IdSiege);

Summary : if you consider the properties A,B,C,D

B.include(x=>x.C.D) will produce the right result
A.Include(x.B.C.D) will produce a wrong result, with a LEFT OUTER JOIN on the third nested property

A-B : many to one , B ->C : Many to 1, C->D : Many to one

I downloaded the latstet EF6 source and will begin to investigate.

Thanks for your feedbacks and idees

Christophe
2014-05-08
Here attached a sample solution
Christophe
2014-05-09
Hi Diego,

Because this issue produce very bad performances on my customer production site (very impacting issue for us)
Is it possible to get a status for the resolution.
Thanks a lot in advance for your anwser
Best regard
Christophe
2014-05-15
Christophe,

FYI, Emil has been investigating this and we now understand why this is happening: the transformation rule that promotes LEFT OUTER JOIN to INNER JOIN is only applied a couple of times, so the third join and successive joins won't be promoted. Since applying those transformation rules more times could regress the time it takes to translate the query to SQL, we need to be very careful. Could you provide some data about the performance impact? E.g. with your customer database and typical data, how long exactly do the query with the LEFT OUTER JOIN and the one with the INNER JOIN take to execute?

Thanks,
Diego
2014-05-25
Hi Diego,

Thanks a lot for your feedback.
We are preparing a bench, I should be able to provide the results for the begining of next week.
Best regards
Christophe

smitpatel added a commit that referenced this issue Aug 8, 2020
Resolves #21773

- For top level SelectExpression in client code, we preserve whole projection.
- We search for all the referenced column for each table in SelectExpression
  - If there are no references outside and if it is LeftJoinExpression for TPT (which is remembered through a state in SelectExpression), then we remove the table altogether.
  - If the table is nested SelectExpression (direct or through a join) then we take referenced columns for that table and prune projection of inner SelectExpression and apply process recursively to prune TPT pattern inside subquery.
smitpatel added a commit that referenced this issue Aug 8, 2020
Resolves #21773

- For top level SelectExpression in client code, we preserve whole projection.
- We search for all the referenced column for each table in SelectExpression
  - If there are no references outside and if it is LeftJoinExpression for TPT (which is remembered through a state in SelectExpression), then we remove the table altogether.
  - If the table is nested SelectExpression (direct or through a join) then we take referenced columns for that table and prune projection of inner SelectExpression and apply process recursively to prune TPT pattern inside subquery.

- Removes derived tables when projecting out only base properties of TPT
- Removes derived tables when joining with collection on base and not selecting any of the derived properties of TPT
@smitpatel smitpatel modified the milestones: 5.0.0, 5.0.0-rc1 Aug 8, 2020
smitpatel added a commit that referenced this issue Aug 10, 2020
Resolves #21773

- For top level SelectExpression in client code, we preserve whole projection.
- We search for all the referenced column for each table in SelectExpression
  - If there are no references outside and if it is LeftJoinExpression for TPT (which is remembered through a state in SelectExpression), then we remove the table altogether.
  - If the table is nested SelectExpression (direct or through a join) then we take referenced columns for that table and prune projection of inner SelectExpression and apply process recursively to prune TPT pattern inside subquery.

- Removes derived tables when projecting out only base properties of TPT
- Removes derived tables when joining with collection on base and not selecting any of the derived properties of TPT
@ghost ghost closed this as completed in #21992 Aug 11, 2020
ghost pushed a commit that referenced this issue Aug 11, 2020
#21992)

Resolves #21773

- For top level SelectExpression in client code, we preserve whole projection.
- We search for all the referenced column for each table in SelectExpression
  - If there are no references outside and if it is LeftJoinExpression for TPT (which is remembered through a state in SelectExpression), then we remove the table altogether.
  - If the table is nested SelectExpression (direct or through a join) then we take referenced columns for that table and prune projection of inner SelectExpression and apply process recursively to prune TPT pattern inside subquery.

- Removes derived tables when projecting out only base properties of TPT
- Removes derived tables when joining with collection on base and not selecting any of the derived properties of TPT
@ajcvickers ajcvickers modified the milestones: 5.0.0-rc1, 5.0.0 Nov 7, 2020
This issue was closed.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants