EF Title Logo

Deep Dive into an ORM:

The Entity Framework Core Query Pipeline


whoami


What’s this about?

Reminder about ORMs

ORMs and .NET

Entity Framework Core

Let’s Talk about Expression Trees!

Expression Trees

Func<int, bool> comparer = num => num < 5;
Expression<Func<int, bool>> comparerExpression
   = num => num < 5;
Func<int, bool> compiled = comparerExpression.Compile();
var numParam = Expression.Parameter(typeof(int), "num");

Expression<Func<int, bool>> comparerExpression2 =
    Expression.Lambda<Func<int, bool>>(
        Expression.LessThan(
            numParam,
            Expression.Constant(5)),
        numParam);

var compiled2 = comparerExpression2.Compile();

Expression Trees - For Perf

Expression Trees - Alternatives

Roslyn - high-level source representation {.fragment data-fragment-index=1}

T4 - source representation with templates {.fragment data-fragment-index=2}

Expression trees {.fragment data-fragment-index=4}

Reflection.Emit - low-level IL {.fragment data-fragment-index=3}

IQueryable


Let’s Get Translating!

Null Semantics

Let’s translate this:

employees.Where(c => c.FirstName == c.LastName)
To this, right?
SELECT ... FROM Employees WHERE FirstName = LastName

Could be wrong, SQL has three-valued logic{style=color:red} {.fragment}

In SQL, FirstName = NULL is never true… (and it ain’t false either :))

Null Semantics 2

Try to mimic C# behavior (best-effort only!)

WHERE FirstName = LastName OR
  (FirstName IS NULL AND LastName IS NULL)
If one of the columns is non-nullable in the database schema, we can optimize {.fragment} Counter-example: case-sensitivity {.fragment} -- ## Subqueries Let's get complicated: ```csharp employees.Where(e => e.Tasks.Count == 2) ```
SELECT ...
FROM Employees AS e
WHERE (
    SELECT COUNT(*)
    FROM Tasks AS t
    WHERE t.EmployeeId = e.Id
) = 2;
-- ## Instance Equality What about this? ```c# employees.Where(e => e == e.Boss) ```
We need to compare keys:
WHERE e.Id = Boss.Id
The same happens if instead of a Boss we have a subquery:
employees.Where(e => e.Tasks.First() == x)

WHERE (SELECT ... FROM Tasks ...).Id1 = x.Id1
-- ## More Instance Equality Now, what happens if something has a composite key?
WHERE Id1 = Boss.Id1 AND Id2 = Boss.Id2
And with a subquery?
employees.Where(e => e.Tasks.First() == x)
WHERE (SELECT ...).Id1 = x.Id1 AND (SELECT ...).Id2 = x.Id2
**Double evaluation... Better not do this!**{style=color:red} -- ## Related Instances In C#, "related instances" are just there in memory.
With DB, we need to fetch only what's needed:
employees.Include(b => b.Tasks)
SELECT ...
FROM Employees AS e
LEFT JOIN Tasks AS t ON e.Id = t.EmployeeId
ORDER BY e.Id, t.Id
* Previous versions of EF Core translated this with multiple queries {.fragment} * Multiple roundtrips vs. "cartesian explosion" * Transactionality -- ## Untranslatability What to do about this? ```c# employees.Where(e => SomeFunc(e)) ``` Previous versions: translate to server if we can, otherwise evaluate on client {.fragment} * Can bring *lots* of data to client {.fragment} * Adding a new translation can break behavior {.fragment} EF Core 3 throws, except for top-most projection {.fragment style=color:red} -- ## Lessons Learned * There are many, many mismatches between C# LINQ and SQL * ORMs need to make decisions about how to translate {.fragment} * We need to do many things to expression trees before they can be converted {.fragment} --- ## Zoom out: How to do all this? -- ## Query Pipeline: Inputs and Outputs * Input: LINQ expression tree (from IQueryable) * Outputs (for relational)? * SQL {.fragment} * Shaper / materializer, to read results (DbDataReader) {.fragment} * We are building a compiler! {.fragment} -- ## Architecture * The query engine is architected as a pipeline of **visitors** {.fragment} * Each visitor traverses the expression tree, responsible for doing one thing {.fragment} -- ## Example Visitor * For example, the instance equality visitor: * Traverses the entire tree {.fragment} * Identifies equality expression over instances {.fragment} * Replaces with equality over their keys {.fragment} -- ## Minimal Visitor Exercise **Exercise**: optimize an expression tree, eliminating needless null checks on non-nullable database columns: ```csharp // INPUT: customers.Where(c => c.Age > 18 && c.Name != null) // OUTPUT: customers.Where(c => c.Age > 18) ``` --
class MyVisitor1 : ExpressionVisitor
{
    protected override Expression VisitBinary(BinaryExpression b)
    {
        var visited = (BinaryExpression)base.VisitBinary(b);

        if (b.NodeType == ExpressionType.NotEqual)
        {
            if (b.Right is ConstantExpression rightConstant &&
                rightConstant.Value == null &&
                !IsNullable(b.Left))
            {
                return Expression.Constant(true);
            }
            // TODO: Sides may be flipped!
        }

        return visited;
    }
}
customers.Where(c => c.Age > 18 && true)
--
class MyVisitor2 : ExpressionVisitor
{
    protected override Expression VisitBinary(BinaryExpression b)
    {
        var visited = (BinaryExpression)base.VisitBinary(b);

        if (b.NodeType == ExpressionType.AndAlso)
        {
            if (b.Right is ConstantExpression rightConstant &&
                rightConstant.Value is bool boolValue &&
                boolValue)
            {
                return b.Left;
            }
            // TODO: Same for other side!
        }
        // TODO: Also take care of OrElse!

        return visited;
    }
}
customers.Where(c => c.Age > 18)
-- ## Recap From this:
customers.Where(c => c.Age > 18 && c.Name != null)</pre>

To this:
customers.Where(c => c.Age > 18 && true)</pre>
</div>

To this:
customers.Where(c => c.Age > 18)</pre>
</div>

--

## EF Core Preprocessor

public virtual Expression Process(Expression query)
{
    query = new EnumerableToQueryableMethodConvertingExpressionVisitor().Visit(query);
    query = new QueryMetadataExtractingExpressionVisitor(_queryCompilationContext).Visit(query);
    query = new AllAnyToContainsRewritingExpressionVisitor().Visit(query);
    query = new GroupJoinFlatteningExpressionVisitor().Visit(query);
    query = new NullCheckRemovingExpressionVisitor().Visit(query);
    query = new EntityEqualityRewritingExpressionVisitor(_queryCompilationContext).Rewrite(query);
    query = new SubqueryMemberPushdownExpressionVisitor().Visit(query);
    query = new NavigationExpandingExpressionVisitor(...).Expand(query);
    query = new FunctionPreprocessingExpressionVisitor().Visit(query);
    new EnumerableVerifyingExpressionVisitor().Visit(query);

    return query;
}
-- ## SQL Translation * We've been talking about LINQ methods, but... at some point we want a model of an SQL query. * Instead of CLR types, methods and members, we want tables and columns. --
customers
   .Where(c => c.Name == "Joe")
   .OrderBy(c => c.Age)
   .Take(5)
   .Select(c => new { c.Name, c.Age });
SELECT Name, Age FROM Customers
WHERE Age > 18
ORDER BY Age
LIMIT 10
-- ## SQL Translation 2 Still expressions, but now custom SQL ones! ```c# customers.Where(c => new[] { 1, 2, 3 }.Contains(c.Id)); // Becomes: WHERE c.ID IN (1, 2, 3) // This is an SqlInExpression ``` Expression trees can have custom expression types {.fragment} Non-relational providers will do something completely different {.fragment} -- ## Provider and User Translations ```csharp customers.Where(c => c.Name.Length > 5) // Becomes: WHERE LEN(c.Name) > 5 // On SqlServer WHERE LENGTH(c.Name) > 5 // On PostgreSQL ``` Users can define functions which render into arbitrary SQL expressions {.fragment} -- ## End of the Line * Some more postprocessing after SQL translation * QuerySqlGenerator is just another visitor, outputs an SQL string {.fragment} * Another visitor code-generates the shaper {.fragment} (...with... expression trees...) {.fragment} -- ## Query Pipeline Overview ![Query Pipeline](/talks/2019-10-08-dotnetos-efcore-query-internals/src/diagrams/pipeline.svg) --- ## Let's talk about Perffffff -- ## Parameterized Queries ```csharp var posts1 = ctx.Posts.Where(p => p.Title.Contains("dotnetos")); var title = "dotnetos"; // Read from somewhere var posts2 = ctx.Posts.Where(p => p.Title.Contains(title)); ```
SELECT ... FROM Posts WHERE STRPOS('dotnetos') > 0;
SELECT ... FROM Posts WHERE STRPOS(@p) > 0;
* Can leverage cached plans and prepared queries {.fragment} * Closure variables are your way of asking for parameterization {.fragment} -- ## Compilation is Slow * No way we run all these visitors each time * The answer is always, always... caching. {.fragment} * **Input**: expression tree. **Outputs**: SQL+shaper. {.fragment} -- ## Structural Comparison We can just do recursive structural comparison: ```csharp var posts = ctx.Posts.Where(p => p.Title.Contains("dotnetos")); ```
-- ## Parameter Extraction ```csharp var title = "dotnetos"; // Read from somewhere var posts2 = ctx.Posts.Where(p => p.Title.Contains(title)); ``` * Before checking the cache, identify parameter references (to closure variables) * Extract them out and replace them with placeholders, "punching holes in the tree" {.fragment} -- ## Zoom out Again ![Full pipeline](/talks/2019-10-08-dotnetos-efcore-query-internals/src/diagrams/pipeline-full.svg)
Cool! However, for each query, we still do: * Parameter extraction * Cache key hashcode calculation * Structural comparison with trees in the query cache
-- ## Compiled Queries ```csharp using (var ctx = new MyContext()) { var id = 8; var posts = ctx.Posts.Single(p => p.Id == id); } ```

// Create an explicitly compiled query
private static Func<MyContext, int, Post> _postById
    = EF.CompileQuery((MyContext ctx, int id) =>
        ctx.Posts.Single(p => p.Id == id));

// Use the compiled query by invoking it
using (var ctx = new MyContext())
{
   var post = _postById(ctx, 8);
}
-- ## Not always so easy Remember our InExpression from before? ```csharp customers.Where(c => new[] { 1, 2, 3 }.Contains(c.Id)); WHERE c.Id IN (1, 2, 3) ```
How do we parameterize this?
var customerIds = new[] { 1, 2, 3 };
customers.Where(c => customerIds.Contains(c.Id));
-- ## Solution :( Expand to constant **at execution time**: ```csharp var values = new[] { 1, 2, 3 }; ctx.Posts.Where(p => values.Contains(p.Foo)); WHERE p.Foo IN (1, 2, 3) ``` * Pollutes database query cache * PostgreSQL has `WHERE p.Foo = ANY (@p)` {.fragment} --- ## Closing Words * Writing an ORM is... complex :) * Being aware of what happens under the hood is always important. {.fragment} * Check the SQL being generated! {.fragment} --- ## Actual Heroes **Smit Patel** (new query pipeline architect) **Maurycy Markowski** (big query guy and also a Polish guy!) **And of course ... the rest of the EF team** --- ## Thank you! **Shay Rojansky** Blog & Presentation: <http://roji.org> Twitter: @shayrojansky