Deep Dive into an ORM:
The Entity Framework Core Query Pipeline
ORMs: can’t live with them…
… can’t live without them
–
Without ORMs, devs have to:
–
Always be familiar with one layer underneath you! {.fragment}
–
Hibernate HQL:
String hql = "FROM Employee E WHERE E.id > 10 ORDER BY E.salary DESC";
Hibernate criteria API:
Criteria crit = session.createCriteria(Product.class);
Criterion priceLessThan = Restrictions.lt("price", 10.0);
Criterion mouse = Restrictions.ilike("description", "mouse", MatchMode.ANYWHERE);
LogicalExpression orExp = Restrictions.or(priceLessThan, mouse);
crit.add(orExp);
List results=crit.list();
–
C# has LINQ (Language-Integrated Query)
var someArray = new[] { 1, 10, 2 };
var result = someArray
.Where(i => i > 5)
.OrderBy(i => i)
.Take(2)
.ToArray();
These are query operators, working in-memory.
–
C# allows us to use LINQ to express queries that don’t run in-memory
var employees = db.Employees
.Where(e => e.Salary > 1000)
.OrderBy(e => e.Salary)
.Take(2)
.ToArray();
–
–
–
Let’s translate this:
employees.Where(e => e.FirstName == e.LastName)
SELECT ... FROM Employees WHERE FirstName = LastName
Could be wrong, SQL has three-valued logic{style=color:red} {.fragment}
In SQL, FirstName = NULL is never true… (and it ain’t false either :))
–
Try to mimic C# behavior (best-effort only!)
WHERE FirstName = LastName OR
(FirstName IS NULL AND LastName IS NULL)
If one of the columns is non-nullable in the database schema, we can optimize {.fragment}
Counter-example: case-sensitivity {.fragment}
--
## Subqueries
Let's get complicated:
```csharp
employees.Where(e => e.Tasks.Count == 2)
```
SELECT ...
FROM Employees AS e
WHERE (
SELECT COUNT(*)
FROM Tasks AS t
WHERE t.EmployeeId = e.Id
) = 2;
WHERE e.Id = Boss.Id
employees.Where(e => e.Tasks.First() == x)
WHERE (SELECT ... FROM Tasks ...).Id1 = x.Id1
WHERE Id1 = Boss.Id1 AND Id2 = Boss.Id2
employees.Where(e => e.Tasks.First() == x)
WHERE (SELECT ...).Id1 = x.Id1 AND (SELECT ...).Id2 = x.Id2
employees.Include(b => b.Tasks)
SELECT ...
FROM Employees AS e
LEFT JOIN Tasks AS t ON e.Id = t.EmployeeId
ORDER BY e.Id, t.Id
Expression<Func<int, bool>> comparerExpression
= num => num < 5;
Func<int, bool> compiled = comparerExpression.Compile();
var numParam = Expression.Parameter(typeof(int), "num");
Expression<Func<int, bool>> comparerExpression2 =
Expression.Lambda<Func<int, bool>>(
Expression.LessThan(
numParam,
Expression.Constant(5)),
numParam);
var compiled2 = comparerExpression2.Compile();
--
## Expression Trees - For Perf
* Great solution for code generation
* Generate specific, tailored lambdas based on config/user input
--
## Visitors
* The query engine is architected as a pipeline of **visitors** {.fragment}
* Each visitor traverses the expression tree, responsible for doing one thing {.fragment}
--
## Example Visitor
* For example, the instance equality visitor:
* Traverses the entire tree {.fragment}
* Identifies equality expression over instances {.fragment}
* Replaces with equality over their keys {.fragment}
--
## Minimal Visitor Exercise
**Exercise**: optimize an expression tree, eliminating needless null checks on non-nullable database columns:
```csharp
// INPUT:
customers.Where(c => c.Age > 18 && c.Name != null)
// OUTPUT:
customers.Where(c => c.Age > 18)
```
--
class MyVisitor1 : ExpressionVisitor
{
protected override Expression VisitBinary(BinaryExpression b)
{
var visited = (BinaryExpression)base.VisitBinary(b);
if (b.NodeType == ExpressionType.NotEqual)
{
if (b.Right is ConstantExpression rightConstant &&
rightConstant.Value == null &&
!IsNullable(b.Left))
{
return Expression.Constant(true);
}
// TODO: Sides may be flipped!
}
return visited;
}
}
customers.Where(c => c.Age > 18 && true)
--
class MyVisitor2 : ExpressionVisitor
{
protected override Expression VisitBinary(BinaryExpression b)
{
var visited = (BinaryExpression)base.VisitBinary(b);
if (b.NodeType == ExpressionType.AndAlso)
{
if (b.Right is ConstantExpression rightConstant &&
rightConstant.Value is bool boolValue &&
boolValue)
{
return b.Left;
}
// TODO: Same for other side!
}
// TODO: Also take care of OrElse!
return visited;
}
}
customers.Where(c => c.Age > 18)
--
## Recap
From this:
customers.Where(c => c.Age > 18 && c.Name != null)</pre>
To this:customers.Where(c => c.Age > 18 && true)</pre>
</div>To this:customers.Where(c => c.Age > 18)</pre>
</div> -- ## EF Core Preprocessor-- ## SQL Translation * We've been talking about LINQ methods, but... at some point we want a model of an SQL query. * Instead of CLR types, methods and members, we want tables and columns. --public virtual Expression Process(Expression query) { query = new EnumerableToQueryableMethodConvertingExpressionVisitor().Visit(query); query = new QueryMetadataExtractingExpressionVisitor(_queryCompilationContext).Visit(query); query = new AllAnyToContainsRewritingExpressionVisitor().Visit(query); query = new GroupJoinFlatteningExpressionVisitor().Visit(query); query = new NullCheckRemovingExpressionVisitor().Visit(query); query = new EntityEqualityRewritingExpressionVisitor(_queryCompilationContext).Rewrite(query); query = new SubqueryMemberPushdownExpressionVisitor().Visit(query); query = new NavigationExpandingExpressionVisitor(...).Expand(query); query = new FunctionPreprocessingExpressionVisitor().Visit(query); new EnumerableVerifyingExpressionVisitor().Visit(query); return query; }
customers .Where(c => c.Name == "Joe") .OrderBy(c => c.Age) .Take(5) .Select(c => new { c.Name, c.Age });
-- ## SQL Translation 2 Still expressions, but now custom SQL ones! ```c# customers.Where(c => new[] { 1, 2, 3 }.Contains(c.Id)); // Becomes: WHERE c.ID IN (1, 2, 3) // This is an SqlInExpression ``` Expression trees can have custom expression types -- ## Provider and User Translations ```csharp customers.Where(c => c.Name.Length > 5) // Becomes: WHERE LEN(c.Name) > 5 // On SqlServer WHERE LENGTH(c.Name) > 5 // On PostgreSQL ``` Users can define functions which render into arbitrary SQL expressions {.fragment} -- ## End of the Line * Some more postprocessing after SQL translation * QuerySqlGenerator is just another visitor, outputs an SQL string * Another visitor code-generates the shaper -- ## Query Pipeline Overview ![Query Pipeline](/talks/2019-11-30-codecamp-efcore-query-internals/src/diagrams/pipeline.svg) --- ## Let's talk about Performance -- ## Compilation is Slow * No way we run all these visitors each time * The answer is always, always... caching. {.fragment} -- ## Zoom out Again ![Full pipeline](/talks/2019-11-30-codecamp-efcore-query-internals/src/diagrams/pipeline-full.svg)SELECT Name, Age FROM Customers WHERE Age > 18 ORDER BY Age LIMIT 10
Cool! However, for each query, we still do: * Some magic called parameter extraction * Cache key hashcode calculation * Structural comparison with trees in the query cache-- ## Compiled Queries ```csharp using (var ctx = new MyContext()) { var id = 8; var posts = ctx.Posts.Single(p => p.Id == id); } ```--- ## Closing Words * Everybody loves ORMs * ... and everybody hates ORMs {.fragment} * Writing an ORM is... complex :) {.fragment} * Being aware of what happens under the hood is always important. {.fragment} * *Always* check the SQL being generated! {.fragment} --- ## Actual Heroes **Smit Patel** (new query pipeline architect) **Maurycy Markowski** (another big query guy) **And of course ... the rest of the EF team** --- ## Thank you! **Shay Rojansky**, <http://roji.org>, @shayrojansky ![Codecamp Feedback](feedback-100.jpg =500x500) ---// Create an explicitly compiled query private static Func<MyContext, int, Post> _postById = EF.CompileQuery((MyContext ctx, int id) => ctx.Posts.Single(p => p.Id == id)); // Use the compiled query by invoking it using (var ctx = new MyContext()) { var post = _postById(ctx, 8); }