<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://www.roji.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.roji.org/" rel="alternate" type="text/html" /><updated>2025-10-31T11:28:47+01:00</updated><id>https://www.roji.org/feed.xml</id><title type="html">Shay Rojansky’s Blog</title><subtitle>Microsoft software engineer working on .NET data access and perf, member of the Entity Framework team. Lead dev of Npgsql, the PostgreSQL provider.</subtitle><author><name>Shay Rojansky</name></author><entry><title type="html">Queryable PostgreSQL arrays in EF Core 8.0</title><link href="https://www.roji.org/queryable-pg-arrays-in-ef8" rel="alternate" type="text/html" title="Queryable PostgreSQL arrays in EF Core 8.0" /><published>2023-05-20T00:00:00+02:00</published><updated>2023-05-20T00:00:00+02:00</updated><id>https://www.roji.org/queryable-pg-arrays-in-ef8</id><content type="html" xml:base="https://www.roji.org/queryable-pg-arrays-in-ef8"><![CDATA[<h2 id="queryable-collections">Queryable collections?</h2>

<p>EF Core 8.0 preview4 has just been released, and one of the big features it introduces is queryable primitive collections. This is a really cool feature that allows mapping primitive collections (e.g. <code class="language-plaintext highlighter-rouge">int[]</code>) to the database, and performing all imaginable LINQ queries over them. Before reading further here, please read the <a href="https://devblogs.microsoft.com/dotnet/announcing-ef8-preview-4">EF Core blog post</a> on this feature; more info and examples are also available in the <a href="https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-8.0/whatsnew#collections-of-primitive-types">EF What’s New documentation</a>.</p>

<p>The rest of this post will discuss PostgreSQL-specific aspects of this feature, which is also fully supported starting with version 8.0.0-preview.4 of the PostgreSQL EF provider.</p>

<h2 id="contains-over-parameter">Contains over parameter</h2>

<p>The EF blog post starts with <a href="https://devblogs.microsoft.com/dotnet/announcing-ef8-preview-4/#translating-linq-contains-with-a-parameter-collection">a tricky problem</a>: how to translate the LINQ Contains operator when the list of values is a parameter?</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">names</span> <span class="p">=</span> <span class="k">new</span><span class="p">[]</span> <span class="p">{</span> <span class="s">"Blog1"</span><span class="p">,</span> <span class="s">"Blog2"</span> <span class="p">};</span>

<span class="kt">var</span> <span class="n">blogs</span> <span class="p">=</span> <span class="k">await</span> <span class="n">context</span><span class="p">.</span><span class="n">Blogs</span>
    <span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">names</span><span class="p">.</span><span class="nf">Contains</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">Name</span><span class="p">))</span>
    <span class="p">.</span><span class="nf">ToArrayAsync</span><span class="p">();</span>
</code></pre></div></div>

<p>The solution introduced in preview4 serializes the <code class="language-plaintext highlighter-rouge">names</code> .NET array into a string containing a JSON array representation, and then uses a SQL function to parse the values out in SQL. Here’s the SQL Server sample:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Executed</span> <span class="n">DbCommand</span> <span class="p">(</span><span class="mi">49</span><span class="n">ms</span><span class="p">)</span> <span class="p">[</span><span class="k">Parameters</span><span class="o">=</span><span class="p">[</span><span class="o">@</span><span class="n">__names_0</span><span class="o">=</span><span class="s1">'["Blog1","Blog2"]'</span> <span class="p">(</span><span class="k">Size</span> <span class="o">=</span> <span class="mi">4000</span><span class="p">)],</span> <span class="n">CommandType</span><span class="o">=</span><span class="s1">'Text'</span><span class="p">,</span> <span class="n">CommandTimeout</span><span class="o">=</span><span class="s1">'30'</span><span class="p">]</span>

<span class="k">SELECT</span> <span class="p">[</span><span class="n">b</span><span class="p">].[</span><span class="n">Id</span><span class="p">],</span> <span class="p">[</span><span class="n">b</span><span class="p">].[</span><span class="n">Name</span><span class="p">]</span>
<span class="k">FROM</span> <span class="p">[</span><span class="n">Blogs</span><span class="p">]</span> <span class="k">AS</span> <span class="p">[</span><span class="n">b</span><span class="p">]</span>
<span class="k">WHERE</span> <span class="k">EXISTS</span> <span class="p">(</span>
    <span class="k">SELECT</span> <span class="mi">1</span>
    <span class="k">FROM</span> <span class="n">OpenJson</span><span class="p">(</span><span class="o">@</span><span class="n">__names_0</span><span class="p">)</span> <span class="k">AS</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
    <span class="k">WHERE</span> <span class="p">[</span><span class="n">n</span><span class="p">].[</span><span class="n">value</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="n">b</span><span class="p">].[</span><span class="n">Name</span><span class="p">])</span>
</code></pre></div></div>

<p>The nice thing about PostgreSQL, is that it has full, first-class support for array types in the database - this is quite a unique feature. So we don’t have to mess around with JSON at all - we can simply send the .NET array directly as a parameter and use it in SQL as follows:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Executed</span> <span class="n">DbCommand</span> <span class="p">(</span><span class="mi">10</span><span class="n">ms</span><span class="p">)</span> <span class="p">[</span><span class="k">Parameters</span><span class="o">=</span><span class="p">[</span><span class="o">@</span><span class="n">__names_0</span><span class="o">=</span><span class="p">{</span> <span class="s1">'Blog1'</span><span class="p">,</span> <span class="s1">'Blog2'</span> <span class="p">}</span> <span class="p">(</span><span class="n">DbType</span> <span class="o">=</span> <span class="k">Object</span><span class="p">)],</span> <span class="n">CommandType</span><span class="o">=</span><span class="s1">'Text'</span><span class="p">,</span> <span class="n">CommandTimeout</span><span class="o">=</span><span class="s1">'30'</span><span class="p">]</span>

<span class="k">SELECT</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Id"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Name"</span>
<span class="k">FROM</span> <span class="nv">"Blogs"</span> <span class="k">AS</span> <span class="n">b</span>
<span class="k">WHERE</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Name"</span> <span class="o">=</span> <span class="k">ANY</span> <span class="p">(</span><span class="o">@</span><span class="n">__names_0</span><span class="p">)</span>
</code></pre></div></div>

<p>In fact, the EF PostgreSQL provider has done this for a few years already, freeing PostgreSQL users with the performance problems that users of other databases had to contend with (<a href="https://github.com/dotnet/efcore/issues/13617">see this issue</a>). So preview4 doesn’t bring any improvements around this specific problem - we were already doing the optimal thing.</p>

<h2 id="fully-queryable-arrays">Fully queryable arrays</h2>

<p>However, even though the EF PostgreSQL array has supported arrays, its support for querying over them has been quite limited. Now, EF 8.0 preview4 unlocks generalized LINQ querying over primitive collections - once again by converting them to JSON, and using a SQL function to unpack them to a relational rowset. For example, the following LINQ query:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">tags</span> <span class="p">=</span> <span class="k">new</span><span class="p">[]</span> <span class="p">{</span> <span class="s">"Tag1"</span><span class="p">,</span> <span class="s">"Tag2"</span> <span class="p">};</span>

<span class="kt">var</span> <span class="n">blogs</span> <span class="p">=</span> <span class="k">await</span> <span class="n">context</span><span class="p">.</span><span class="n">Blogs</span>
    <span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">Tags</span><span class="p">.</span><span class="nf">Intersect</span><span class="p">(</span><span class="n">tags</span><span class="p">).</span><span class="nf">Count</span><span class="p">()</span> <span class="p">&gt;=</span> <span class="m">2</span><span class="p">)</span>
    <span class="p">.</span><span class="nf">ToArrayAsync</span><span class="p">();</span>
</code></pre></div></div>

<p>… is now translated to the following SQL with SQL Server:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Executed</span> <span class="n">DbCommand</span> <span class="p">(</span><span class="mi">48</span><span class="n">ms</span><span class="p">)</span> <span class="p">[</span><span class="k">Parameters</span><span class="o">=</span><span class="p">[</span><span class="o">@</span><span class="n">__tags_0</span><span class="o">=</span><span class="s1">'["Tag1","Tag2"]'</span> <span class="p">(</span><span class="k">Size</span> <span class="o">=</span> <span class="mi">4000</span><span class="p">)],</span> <span class="n">CommandType</span><span class="o">=</span><span class="s1">'Text'</span><span class="p">,</span> <span class="n">CommandTimeout</span><span class="o">=</span><span class="s1">'30'</span><span class="p">]</span>

<span class="k">SELECT</span> <span class="p">[</span><span class="n">b</span><span class="p">].[</span><span class="n">Id</span><span class="p">],</span> <span class="p">[</span><span class="n">b</span><span class="p">].[</span><span class="n">Name</span><span class="p">],</span> <span class="p">[</span><span class="n">b</span><span class="p">].[</span><span class="n">Tags</span><span class="p">]</span>
<span class="k">FROM</span> <span class="p">[</span><span class="n">Blogs</span><span class="p">]</span> <span class="k">AS</span> <span class="p">[</span><span class="n">b</span><span class="p">]</span>
<span class="k">WHERE</span> <span class="p">(</span>
    <span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
    <span class="k">FROM</span> <span class="p">(</span>
        <span class="k">SELECT</span> <span class="p">[</span><span class="n">t</span><span class="p">].[</span><span class="n">value</span><span class="p">]</span>
        <span class="k">FROM</span> <span class="n">OPENJSON</span><span class="p">([</span><span class="n">b</span><span class="p">].[</span><span class="n">Tags</span><span class="p">])</span> <span class="k">AS</span> <span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="c1">-- column collection</span>
        <span class="k">INTERSECT</span>
        <span class="k">SELECT</span> <span class="p">[</span><span class="n">t1</span><span class="p">].[</span><span class="n">value</span><span class="p">]</span>
        <span class="k">FROM</span> <span class="n">OPENJSON</span><span class="p">(</span><span class="o">@</span><span class="n">__tags_0</span><span class="p">)</span> <span class="k">AS</span> <span class="p">[</span><span class="n">t1</span><span class="p">]</span> <span class="c1">-- parameter collection</span>
    <span class="p">)</span> <span class="k">AS</span> <span class="p">[</span><span class="n">t0</span><span class="p">])</span> <span class="o">&gt;=</span> <span class="mi">2</span>
</code></pre></div></div>

<p>This uses the SQL Server <code class="language-plaintext highlighter-rouge">OpenJson</code> function to unpack the JSON array column and parameter into rowsets, over which the LINQ operators are translated in the standard way.</p>

<p>Now let’s see how this works on PostgreSQL!</p>

<p>First, here’s our .NET entity type:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">Blog</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="kt">int</span> <span class="n">Id</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="kt">string</span><span class="p">?</span> <span class="n">Name</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="kt">string</span><span class="p">[]</span> <span class="n">Tags</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This creates the following table:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="nv">"Blogs"</span> <span class="p">(</span>
  <span class="nv">"Id"</span> <span class="nb">integer</span> <span class="k">GENERATED</span> <span class="k">BY</span> <span class="k">DEFAULT</span> <span class="k">AS</span> <span class="k">IDENTITY</span><span class="p">,</span>
  <span class="nv">"Name"</span> <span class="nb">text</span><span class="p">,</span>
  <span class="nv">"Tags"</span> <span class="nb">text</span><span class="p">[]</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="k">CONSTRAINT</span> <span class="nv">"PK_Blogs"</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="nv">"Id"</span><span class="p">)</span>
<span class="p">);</span>
</code></pre></div></div>

<p>Note that <code class="language-plaintext highlighter-rouge">Tags</code> is a PostgreSQL array - <code class="language-plaintext highlighter-rouge">text[]</code>, and not a simple string column containing a JSON array. Aside from mapping .NET arrays more directly and naturally, this has the following advantages:</p>

<ul>
  <li>It’s stored more efficiently: array elements are stored in the same efficient binary encoding that PostgreSQL uses for regular, non-arrays values.</li>
  <li>It’s also transferred more efficiently. The same binary encoding is used when reading and writing the elements, meaning that we don’t need to constantly serialize and parse JSON.</li>
  <li>Arrays provide more database type safety; it’s impossible for the column to contain anything than the defined array type. Similar type safety may be achievable with a JSON array via complex check constraints, but this is more complicated and probably less efficient.</li>
</ul>

<p>The LINQ query above translates to the following SQL:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Executed</span> <span class="n">DbCommand</span> <span class="p">(</span><span class="mi">14</span><span class="n">ms</span><span class="p">)</span> <span class="p">[</span><span class="k">Parameters</span><span class="o">=</span><span class="p">[</span><span class="o">@</span><span class="n">__tags_0</span><span class="o">=</span><span class="p">{</span> <span class="s1">'Tag1'</span><span class="p">,</span> <span class="s1">'Tag2'</span> <span class="p">}</span> <span class="p">(</span><span class="n">DbType</span> <span class="o">=</span> <span class="k">Object</span><span class="p">)],</span> <span class="n">CommandType</span><span class="o">=</span><span class="s1">'Text'</span><span class="p">,</span> <span class="n">CommandTimeout</span><span class="o">=</span><span class="s1">'30'</span><span class="p">]</span>

<span class="k">SELECT</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Id"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Name"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span>
<span class="k">FROM</span> <span class="nv">"Blogs"</span> <span class="k">AS</span> <span class="n">b</span>
<span class="k">WHERE</span> <span class="p">(</span>
  <span class="k">SELECT</span> <span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)::</span><span class="nb">int</span>
  <span class="k">FROM</span> <span class="p">(</span>
      <span class="k">SELECT</span> <span class="n">t</span><span class="p">.</span><span class="n">value</span>
      <span class="k">FROM</span> <span class="k">unnest</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span><span class="p">)</span> <span class="k">AS</span> <span class="n">t</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
      <span class="k">INTERSECT</span>
      <span class="k">SELECT</span> <span class="n">t1</span><span class="p">.</span><span class="n">value</span>
      <span class="k">FROM</span> <span class="k">unnest</span><span class="p">(</span><span class="o">@</span><span class="n">__tags_0</span><span class="p">)</span> <span class="k">AS</span> <span class="n">t1</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
  <span class="p">)</span> <span class="k">AS</span> <span class="n">t0</span><span class="p">)</span> <span class="o">&gt;=</span> <span class="mi">2</span>
</code></pre></div></div>

<p>Where SQL Server had the <code class="language-plaintext highlighter-rouge">OPENJSON</code> function, on PostgreSQL we use the <a href="https://www.postgresql.org/docs/current/functions-array.html#id-1.5.8.25.6.2.2.17.1.1.1"><code class="language-plaintext highlighter-rouge">unnest</code></a> function to expand the array into a relational rowset; conceptually things are very similar, except that the thing being expanded is a native PostgreSQL array rather than a string value containing a JSON array.</p>

<p>So far, so good: we can use arbitrary LINQ operators to query PostgreSQL array columns (and parameters), and the EF provider translates those by “unnesting” the array and then using regular SQL over that.</p>

<h2 id="and-a-bonus-postgresql-specialized-translations">And a bonus: PostgreSQL specialized translations</h2>

<p>We could stop there - queryable arrays are already a powerful, flexible new mechanism for your LINQ queries. But PostgreSQL also provides a rich set of functions and operators for working with arrays - far beyond what’s possible with JSON arrays in other databases. For example, let’s say you want to index an element in the array:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">blogs</span> <span class="p">=</span> <span class="k">await</span> <span class="n">context</span><span class="p">.</span><span class="n">Blogs</span>
    <span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">Tags</span><span class="p">[</span><span class="m">2</span><span class="p">]</span> <span class="p">==</span> <span class="s">"foo"</span><span class="p">)</span>
    <span class="p">.</span><span class="nf">ToArrayAsync</span><span class="p">();</span>
</code></pre></div></div>

<p>On SQL Server this translates to the following:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="p">[</span><span class="n">b</span><span class="p">].[</span><span class="n">Id</span><span class="p">],</span> <span class="p">[</span><span class="n">b</span><span class="p">].[</span><span class="n">Name</span><span class="p">],</span> <span class="p">[</span><span class="n">b</span><span class="p">].[</span><span class="n">Tags</span><span class="p">]</span>
<span class="k">FROM</span> <span class="p">[</span><span class="n">Blogs</span><span class="p">]</span> <span class="k">AS</span> <span class="p">[</span><span class="n">b</span><span class="p">]</span>
<span class="k">WHERE</span> <span class="k">CAST</span><span class="p">(</span><span class="n">JSON_VALUE</span><span class="p">([</span><span class="n">p</span><span class="p">].[</span><span class="n">Ints</span><span class="p">],</span> <span class="s1">'$[1]'</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">int</span><span class="p">)</span> <span class="o">=</span> <span class="mi">10</span>
</code></pre></div></div>

<p>… whereas on PostgreSQL, we can simply do the following:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Id"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Name"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span>
<span class="k">FROM</span> <span class="nv">"Blogs"</span> <span class="k">AS</span> <span class="n">b</span>
<span class="k">WHERE</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'foo'</span>
</code></pre></div></div>

<p>For something more complex, the provider can translate queries of the following form:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">tags</span> <span class="p">=</span> <span class="k">new</span><span class="p">[]</span> <span class="p">{</span> <span class="s">"Tag1"</span><span class="p">,</span> <span class="s">"Tag2"</span> <span class="p">};</span>

<span class="kt">var</span> <span class="n">blogs</span> <span class="p">=</span> <span class="k">await</span> <span class="n">context</span><span class="p">.</span><span class="n">Blogs</span>
    <span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">tags</span><span class="p">.</span><span class="nf">All</span><span class="p">(</span><span class="n">t</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">Tags</span><span class="p">.</span><span class="nf">Contains</span><span class="p">(</span><span class="n">t</span><span class="p">)))</span>
    <span class="p">.</span><span class="nf">ToArrayAsync</span><span class="p">();</span>
</code></pre></div></div>

<p>This, in effect, queries Blogs where the Tags column contains all elements in the <code class="language-plaintext highlighter-rouge">tags</code> parameter. It so happens that PostgreSQL has an array containment operator, so we translate this to:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Id"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Name"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span>
<span class="k">FROM</span> <span class="nv">"Blogs"</span> <span class="k">AS</span> <span class="n">b</span>
<span class="k">WHERE</span> <span class="o">@</span><span class="n">__tags_0</span> <span class="o">&lt;@</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span>
</code></pre></div></div>

<p>This translation - and several like it - have been implemented for several years already; but 8.0.0-preview.4 brings a few new ones. For example:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">tags</span> <span class="p">=</span> <span class="k">new</span><span class="p">[]</span> <span class="p">{</span> <span class="s">"Tag1"</span><span class="p">,</span> <span class="s">"Tag2"</span> <span class="p">};</span>

<span class="kt">var</span> <span class="n">blogs</span> <span class="p">=</span> <span class="k">await</span> <span class="n">context</span><span class="p">.</span><span class="n">Blogs</span>
    <span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">Tags</span><span class="p">.</span><span class="nf">Intersect</span><span class="p">(</span><span class="n">tags</span><span class="p">).</span><span class="nf">Any</span><span class="p">())</span>
    <span class="p">.</span><span class="nf">ToArrayAsync</span><span class="p">();</span>
</code></pre></div></div>

<p>This queries Blogs where there’s any overlap between the Tags column and the <code class="language-plaintext highlighter-rouge">tags</code> parameter:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Id"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Name"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span>
<span class="k">FROM</span> <span class="nv">"Blogs"</span> <span class="k">AS</span> <span class="n">b</span>
<span class="k">WHERE</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span> <span class="o">&amp;&amp;</span> <span class="o">@</span><span class="n">__tags_0</span>
</code></pre></div></div>

<p>Moving on from set operations, we now also translate Skip and Take to array slicing operations. For example:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">blogs</span> <span class="p">=</span> <span class="k">await</span> <span class="n">context</span><span class="p">.</span><span class="n">Blogs</span>
    <span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">Tags</span><span class="p">.</span><span class="nf">Skip</span><span class="p">(</span><span class="m">2</span><span class="p">).</span><span class="nf">Contains</span><span class="p">(</span><span class="s">"Tag1"</span><span class="p">))</span>
    <span class="p">.</span><span class="nf">ToArrayAsync</span><span class="p">();</span>
</code></pre></div></div>

<p>… translates to:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Id"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Name"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span>
<span class="k">FROM</span> <span class="nv">"Blogs"</span> <span class="k">AS</span> <span class="n">b</span>
<span class="k">WHERE</span> <span class="s1">'Tag1'</span> <span class="o">=</span> <span class="k">ANY</span> <span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span><span class="p">[</span><span class="mi">3</span><span class="p">:])</span>
</code></pre></div></div>

<p>Note that the C# 2 has been transformed to a 3, since PostgreSQL arrays are 1-based, not 0-based. We can do the same for Take:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">blogs</span> <span class="p">=</span> <span class="k">await</span> <span class="n">context</span><span class="p">.</span><span class="n">Blogs</span>
    <span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">Tags</span><span class="p">.</span><span class="nf">Take</span><span class="p">(</span><span class="m">2</span><span class="p">).</span><span class="nf">Contains</span><span class="p">(</span><span class="s">"Tag1"</span><span class="p">))</span>
    <span class="p">.</span><span class="nf">ToArrayAsync</span><span class="p">();</span>
</code></pre></div></div>

<p>… which translates to:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Id"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Name"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span>
<span class="k">FROM</span> <span class="nv">"Blogs"</span> <span class="k">AS</span> <span class="n">b</span>
<span class="k">WHERE</span> <span class="s1">'Tag1'</span> <span class="o">=</span> <span class="k">ANY</span> <span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="nv">"Tags"</span><span class="p">[:</span><span class="mi">2</span><span class="p">])</span>
</code></pre></div></div>

<p>… and even combines the two, generating:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SELECT</span> <span class="n">b</span><span class="p">.</span><span class="s">"Id"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="s">"Name"</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="s">"Tags"</span>
<span class="n">FROM</span> <span class="s">"Blogs"</span> <span class="n">AS</span> <span class="n">b</span>
<span class="n">WHERE</span> <span class="err">'</span><span class="n">Tag1</span><span class="err">'</span> <span class="p">=</span> <span class="nf">ANY</span> <span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="s">"Tags"</span><span class="p">[</span><span class="m">2</span><span class="p">:</span><span class="m">3</span><span class="p">])</span>
</code></pre></div></div>

<p>For all the specialized translations supported by the provider, <a href="https://www.npgsql.org/efcore/mapping/array.html">see this doc page</a>. But remember - even if a specialized translation isn’t available, the provider will now use <code class="language-plaintext highlighter-rouge">unnest</code> to expand your array to a rowset, and then employ standard SQL to compose query operators on top of it.</p>

<h3 id="summary">Summary</h3>

<p>The PostgreSQL provider has supported arrays for quite a while, but 8.0.0-preview.4 brings a major upgrade to array support: arbitrary LINQ operators can now be used, and some specialized PostgreSQL translations have been added to make your SQL tighter and more efficient. Let us know about cool querying ideas or any bugs!</p>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[Queryable collections?]]></summary></entry><entry><title type="html">When “UTC everywhere” isn’t enough - storing time zones in PostgreSQL and SQL Server</title><link href="https://www.roji.org/storing-timezones-in-the-db" rel="alternate" type="text/html" title="When “UTC everywhere” isn’t enough - storing time zones in PostgreSQL and SQL Server" /><published>2021-11-10T00:00:00+01:00</published><updated>2021-11-10T00:00:00+01:00</updated><id>https://www.roji.org/storing-timezones-in-the-db</id><content type="html" xml:base="https://www.roji.org/storing-timezones-in-the-db"><![CDATA[<h2 id="when-utc-everywhere-isnt-enough">When “UTC everywhere” isn’t enough</h2>

<p>I’ve been dealing a lot with timestamps, timezones and database recently - especially on PostgreSQL (<a href="/postgresql-dotnet-timestamp-mapping">see this blog post</a>), but also in general. Recently, on the Entity Framework Core community standup, <a href="https://www.youtube.com/watch?v=ZLJLfImuFqM&amp;list=PLdo4fOcmZ0oX-DBuRG4u58ZTAJgBAeQ-t&amp;index=2">we also hosted Jon Skeet</a> and chatted about NodaTime, timestamps, time zones, UTC and how they all relate to databases - I highly recommend watching that!</p>

<p>Now, a lot has been said about “UTC everywhere”; according to this pattern, all date/time representations in your system should always be in UTC, and if you get a local timestamp externally (e.g. from a user), you convert it to UTC as early as possible. The idea is to quickly clear away all the icky timezone-related problems, and to have a UTC-only nirvana from that point on. While this works well for many cases - e.g. when you just want to record when something happened in the global timeline - it is not a silver bullet, and you should think carefully about it. Jon Skeet already explained this better than I could, so go read his <a href="https://codeblog.jonskeet.uk/2019/03/27/storing-utc-is-not-a-silver-bullet/">blog post on this</a>. As a very short tl;dr, time zone conversion rules may change after the moment you perform the conversion, so the user-provided local timestamp (and time zone) may start converting to a <em>different</em> UTC timestamp at some point! As a result, for events which take place on a specific time in a specific time zone, it’s better to store the local timestamp and the time zone (not offset!).</p>

<p>So let’s continue Jon’s blog post, and see how to actually perform that on two real databases - PostgreSQL and SQL Server. Following Jon’s preferred option, we want to store the following in the database:</p>

<ol>
  <li>The user-provided local timestamp.</li>
  <li>The user-provided time zone ID. This is <em>not</em> an offset, but rather a daylight savings-aware time zone, represented as a string.</li>
  <li>A UTC timestamp that’s computed (or generated) from the above two values. This can be used to order the rows by their occurrence on the global timeline, and can even be indexed.</li>
</ol>

<p>In Jon’s <a href="https://nodatime.org">NodaTime</a> library, the <a href="https://nodatime.org/3.0.x/api/NodaTime.ZonedDateTime.html">ZonedDateTime</a> type precisely represents the first two values above. Unfortunately, databases typically don’t have such a type; SQL Server does have <code class="language-plaintext highlighter-rouge">datetimeoffset</code>, but an offset is not a time zone (it isn’t daylight savings-aware). So we must use separate columns to represent the data above.</p>

<p>We’ll start with PostgreSQL, but we’ll later see how things work with SQL Server as well. The code samples below will show Entity Framework Core, but the same should be doable with any other data access layer as well.</p>

<h2 id="postgresql">PostgreSQL</h2>

<p>PostgreSQL conveniently has a type called <code class="language-plaintext highlighter-rouge">timestamp without time zone</code> for local timestamps in an unspecified time zone, and a badly-named type called <code class="language-plaintext highlighter-rouge">timestamp with time zone</code>, for UTC timestamps (no time zone is actually persisted); those are perfect for our two timestamps. We also want the UTC timestamp to be generated from the two other values, so we’ll set up a <a href="https://www.postgresql.org/docs/current/ddl-generated-columns.html">PostgreSQL generated column</a> (called <a href="https://docs.microsoft.com/ef/core/modeling/generated-properties#computed-columns">computed column</a> by EF Core) to do that. Here’s the minimal EF Core model and context, using <a href="https://www.npgsql.org/efcore/mapping/nodatime.html">the NodaTime plugin</a>:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">EventContext</span> <span class="p">:</span> <span class="n">DbContext</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="n">DbSet</span><span class="p">&lt;</span><span class="n">Event</span><span class="p">&gt;</span> <span class="n">Events</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>

    <span class="k">protected</span> <span class="k">override</span> <span class="k">void</span> <span class="nf">OnConfiguring</span><span class="p">(</span><span class="n">DbContextOptionsBuilder</span> <span class="n">optionsBuilder</span><span class="p">)</span>
        <span class="p">=&gt;</span> <span class="n">optionsBuilder</span><span class="p">.</span><span class="nf">UseNpgsql</span><span class="p">(</span><span class="s">@"Host=localhost;Username=test;Password=test"</span><span class="p">,</span> <span class="n">o</span> <span class="p">=&gt;</span> <span class="n">o</span><span class="p">.</span><span class="nf">UseNodaTime</span><span class="p">());</span>

    <span class="k">protected</span> <span class="k">override</span> <span class="k">void</span> <span class="nf">OnModelCreating</span><span class="p">(</span><span class="n">ModelBuilder</span> <span class="n">modelBuilder</span><span class="p">)</span>
        <span class="p">=&gt;</span> <span class="n">modelBuilder</span><span class="p">.</span><span class="n">Entity</span><span class="p">&lt;</span><span class="n">Event</span><span class="p">&gt;(</span><span class="n">b</span> <span class="p">=&gt;</span>
            <span class="p">{</span>
                <span class="n">b</span><span class="p">.</span><span class="nf">Property</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">UtcTimestamp</span><span class="p">)</span>
                    <span class="p">.</span><span class="nf">HasComputedColumnSql</span><span class="p">(</span><span class="s">@"""LocalTimestamp"" AT TIME ZONE ""TimeZoneId"""</span><span class="p">,</span> <span class="n">stored</span><span class="p">:</span> <span class="k">true</span><span class="p">);</span>

                <span class="n">b</span><span class="p">.</span><span class="nf">HasIndex</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">UtcTimestamp</span><span class="p">);</span>
            <span class="p">});</span>
<span class="p">}</span>

<span class="k">public</span> <span class="k">class</span> <span class="nc">Event</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="kt">int</span> <span class="n">Id</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>

    <span class="k">public</span> <span class="n">LocalDateTime</span> <span class="n">LocalTimestamp</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="n">Instant</span> <span class="n">UtcTimestamp</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="kt">string</span> <span class="n">TimeZoneId</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This causes the following table to be created:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="nv">"Events"</span> <span class="p">(</span>
  <span class="nv">"Id"</span> <span class="nb">integer</span> <span class="k">GENERATED</span> <span class="k">BY</span> <span class="k">DEFAULT</span> <span class="k">AS</span> <span class="k">IDENTITY</span><span class="p">,</span>
  <span class="nv">"LocalTimestamp"</span> <span class="nb">timestamp</span> <span class="k">without</span> <span class="nb">time</span> <span class="k">zone</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="nv">"UtcTimestamp"</span> <span class="nb">timestamp</span> <span class="k">with</span> <span class="nb">time</span> <span class="k">zone</span> <span class="k">GENERATED</span> <span class="n">ALWAYS</span> <span class="k">AS</span> <span class="p">(</span><span class="nv">"LocalTimestamp"</span> <span class="k">AT</span> <span class="nb">TIME</span> <span class="k">ZONE</span> <span class="nv">"TimeZoneId"</span><span class="p">)</span> <span class="n">STORED</span><span class="p">,</span>
  <span class="nv">"TimeZoneId"</span> <span class="nb">text</span> <span class="k">NULL</span><span class="p">,</span>
  <span class="k">CONSTRAINT</span> <span class="nv">"PK_Events"</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="p">(</span><span class="nv">"Id"</span><span class="p">)</span>
<span class="p">);</span>
</code></pre></div></div>

<p>A few notes on the above:</p>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">AT TIME ZONE</code> operator in the generated column definition converts our local timestamp to a UTC timestamp, using the time zone recorded in the other column.</li>
  <li>PostgreSQL uses IANA/Olson timezone IDs - this is what you need to store in <code class="language-plaintext highlighter-rouge">TimeZoneId</code>. These time zones look like <code class="language-plaintext highlighter-rouge">Europe/Berlin</code>, and are not the Windows time zones that .NET developers are usually used to. The good news is that .NET 6.0 contains <a href="https://devblogs.microsoft.com/dotnet/date-time-and-time-zone-enhancements-in-net-6/">time zone improvements</a> which allow working with IANA/Olson time zones.</li>
  <li><code class="language-plaintext highlighter-rouge">UtcTimestamp</code> is a <em>stored</em> generated column, meaning that its value gets computed whenever the row is modified, and gets persisted in the table just like any other column. Databases usually also support non-stored generated columns, which get computed every time upon select, but PostgreSQL does not support these yet. This distinction will actually be important further down.</li>
  <li>We create an index over our generated column, which allows us to efficiently perform queries on our events, e.g. get all of them sorted on the global timeline.</li>
</ul>

<p>Perfect, job done… or is it?</p>

<p>The astute reader will have noticed that since our UTC timestamp is a stored generated column, it’s computed when we insert the row, and is not recomputed again unless the row changes. So what happens if the time zone database actually changes after that? That’s right - our UTC timestamp may not longer be correct, and that’s exactly the problem we wanted to fix by preserving the original, user-provided local time and time zone! To “resync” the UTC timestamp, we can recreate the column after a time zone database change (or just periodically):</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="k">TABLE</span> <span class="nv">"Events"</span> <span class="k">DROP</span> <span class="k">COLUMN</span> <span class="nv">"UtcTimestamp"</span><span class="p">;</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="nv">"Events"</span> <span class="k">ADD</span> <span class="k">COLUMN</span> <span class="nv">"UtcTimestamp"</span> <span class="nb">timestamp</span> <span class="k">with</span> <span class="nb">time</span> <span class="k">zone</span> <span class="k">GENERATED</span> <span class="n">ALWAYS</span> <span class="k">AS</span>  <span class="p">(</span><span class="nv">"LocalTimestamp"</span> <span class="k">AT</span> <span class="nb">TIME</span> <span class="k">ZONE</span> <span class="nv">"TimeZoneId"</span><span class="p">)</span> <span class="n">STORED</span><span class="p">;</span>
</code></pre></div></div>

<p>Note that all this assumes you actually need the UTC timestamp as a database column; an alternative would be to omit it, and to perform the time zone conversion in your queries. For example, with the NodaTime plugin you can do the following:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">events</span> <span class="p">=</span> <span class="k">await</span> <span class="n">ctx</span><span class="p">.</span><span class="n">Events</span>
    <span class="p">.</span><span class="nf">OrderBy</span><span class="p">(</span><span class="n">e</span> <span class="p">=&gt;</span> <span class="n">e</span><span class="p">.</span><span class="n">LocalTimestamp</span><span class="p">.</span><span class="nf">InZoneLeniently</span><span class="p">(</span><span class="n">DateTimeZoneProviders</span><span class="p">.</span><span class="n">Tzdb</span><span class="p">[</span><span class="n">e</span><span class="p">.</span><span class="n">TimeZoneId</span><span class="p">]).</span><span class="nf">ToInstant</span><span class="p">())</span>
    <span class="p">.</span><span class="nf">ToListAsync</span><span class="p">();</span>
</code></pre></div></div>

<p>This will translate to the following query:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">e</span><span class="p">.</span><span class="nv">"Id"</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="nv">"LocalTimestamp"</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="nv">"TimeZoneId"</span>
<span class="k">FROM</span> <span class="nv">"Events"</span> <span class="k">AS</span> <span class="n">e</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">e</span><span class="p">.</span><span class="nv">"LocalTimestamp"</span> <span class="k">AT</span> <span class="nb">TIME</span> <span class="k">ZONE</span> <span class="n">e</span><span class="p">.</span><span class="nv">"TimeZoneId"</span>
</code></pre></div></div>

<p>This effectively does the same thing as the generated column above, but doing the time zone conversion at query time; this ensures the up-to-date time zone database is always used, and does not take up any disk space. The main disadvantage, of course, is that you can’t have an index over the UTC timestamp, so operations like sorting will be slow.</p>

<h2 id="sql-server">SQL Server</h2>

<p>Let’s see how this whole thing works on another database - SQL Server. We’ll do pretty much the same thing, but to change things up, we’ll just use the native BCL <code class="language-plaintext highlighter-rouge">DateTime</code> type instead of NodaTime (although a NodaTime plugin for the SQL Server provider <a href="https://github.com/StevenRasmussen/EFCore.SqlServer.NodaTime">does exist</a>). As before, here’s the minimal EF Core model and context:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">EventContext</span> <span class="p">:</span> <span class="n">DbContext</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="n">DbSet</span><span class="p">&lt;</span><span class="n">Event</span><span class="p">&gt;</span> <span class="n">Events</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>

    <span class="k">protected</span> <span class="k">override</span> <span class="k">void</span> <span class="nf">OnConfiguring</span><span class="p">(</span><span class="n">DbContextOptionsBuilder</span> <span class="n">optionsBuilder</span><span class="p">)</span>
        <span class="p">=&gt;</span> <span class="n">optionsBuilder</span><span class="p">.</span><span class="nf">UseSqlServer</span><span class="p">(</span><span class="s">@"&lt;connection string&gt;"</span><span class="p">)</span>

    <span class="k">protected</span> <span class="k">override</span> <span class="k">void</span> <span class="nf">OnModelCreating</span><span class="p">(</span><span class="n">ModelBuilder</span> <span class="n">modelBuilder</span><span class="p">)</span>
        <span class="p">=&gt;</span> <span class="n">modelBuilder</span><span class="p">.</span><span class="n">Entity</span><span class="p">&lt;</span><span class="n">Event</span><span class="p">&gt;(</span><span class="n">b</span> <span class="p">=&gt;</span>
            <span class="p">{</span>
                <span class="n">b</span><span class="p">.</span><span class="nf">Property</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">UtcTimestamp</span><span class="p">)</span>
                    <span class="p">.</span><span class="nf">HasComputedColumnSql</span><span class="p">(</span><span class="s">@"[LocalTimestamp] AT TIME ZONE [TimeZoneId] AT TIME ZONE 'UTC'"</span><span class="p">,</span> <span class="n">stored</span><span class="p">:</span> <span class="k">true</span><span class="p">);</span>

                <span class="n">b</span><span class="p">.</span><span class="nf">HasIndex</span><span class="p">(</span><span class="n">b</span> <span class="p">=&gt;</span> <span class="n">b</span><span class="p">.</span><span class="n">UtcTimestamp</span><span class="p">);</span>
            <span class="p">});</span>
<span class="p">}</span>

<span class="k">public</span> <span class="k">class</span> <span class="nc">Event</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="kt">int</span> <span class="n">Id</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>

    <span class="k">public</span> <span class="n">DateTime</span> <span class="n">LocalTimestamp</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="n">DateTimeOffset</span> <span class="n">UtcTimestamp</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="kt">string</span> <span class="n">TimeZoneId</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>A couple of notes, comparing this to PostgreSQL:</p>

<ul>
  <li>On SQL Server, <code class="language-plaintext highlighter-rouge">AT TIME ZONE</code> returns a <code class="language-plaintext highlighter-rouge">datetimeoffset</code> type - that’s why <code class="language-plaintext highlighter-rouge">UtcTimestamp</code> is a <code class="language-plaintext highlighter-rouge">DateTimeOffset</code>. If you really want a <code class="language-plaintext highlighter-rouge">UtcTimestamp</code> to be a <code class="language-plaintext highlighter-rouge">DateTime</code>, you can add a conversion back from <code class="language-plaintext highlighter-rouge">datetimeoffset</code> to <code class="language-plaintext highlighter-rouge">datetime2</code>.</li>
  <li>The computed column SQL is a bit more complicated: we first convert the local timestamp to a <code class="language-plaintext highlighter-rouge">datetimeoffset</code> in the user’s time zone, and then to a UTC <code class="language-plaintext highlighter-rouge">datetimeoffset</code>.</li>
</ul>

<p>Looks great… except that trying to create the table gives us the following error: <code class="language-plaintext highlighter-rouge">Computed column 'UtcTimestamp' in table 'Events' cannot be persisted because the column is non-deterministic</code>. SQL Server is stricter than PostgreSQL here: since the <code class="language-plaintext highlighter-rouge">AT TIME ZONE</code> operator depends on an external time zone database - which can change at any time - it is non-deterministic, and therefore cannot be used in a computed column definition. In effect, SQL Server is alerting you to the danger discussed above - your UTC timestamp may become out of sync with its inputs.</p>

<p>If you’re willing to give up the index, then unlike PostgreSQL you can use a non-stored computed column instead:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">modelBuilder</span><span class="p">.</span><span class="n">Entity</span><span class="p">&lt;</span><span class="n">Event</span><span class="p">&gt;()</span>
    <span class="p">.</span><span class="nf">Property</span><span class="p">(</span><span class="n">e</span> <span class="p">=&gt;</span> <span class="n">e</span><span class="p">.</span><span class="n">UtcTimestamp</span><span class="p">)</span>
    <span class="p">.</span><span class="nf">HasComputedColumnSql</span><span class="p">(</span><span class="s">@"[LocalTimestamp] AT TIME ZONE [TimeZoneId] AT TIME ZONE 'UTC'"</span><span class="p">);</span>
</code></pre></div></div>

<p>Note that we removed the <code class="language-plaintext highlighter-rouge">stored: true</code> we had before (the default is non-stored). This column cannot be indexed, and effectively fulfils the same purpose as the PostgreSQL query we saw above. If you do want an indexed column, then you’ll have to set up a database trigger to keep <code class="language-plaintext highlighter-rouge">UtcTimestamp</code> up to date:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">OR</span> <span class="k">ALTER</span> <span class="k">TRIGGER</span> <span class="n">Events_UPDATE</span> <span class="k">ON</span> <span class="n">Events</span>
    <span class="k">AFTER</span> <span class="k">INSERT</span><span class="p">,</span> <span class="k">UPDATE</span>
    <span class="k">AS</span>
<span class="k">BEGIN</span>
    <span class="k">SET</span> <span class="n">NOCOUNT</span> <span class="k">ON</span><span class="p">;</span>

    <span class="k">DECLARE</span> <span class="o">@</span><span class="n">Id</span> <span class="nb">INT</span>
    <span class="k">DECLARE</span> <span class="o">@</span><span class="n">TimeZone</span> <span class="n">NVARCHAR</span><span class="p">(</span><span class="k">MAX</span><span class="p">)</span>
    <span class="k">DECLARE</span> <span class="o">@</span><span class="k">LocalTimestamp</span> <span class="n">DATETIME2</span>

    <span class="k">SELECT</span> <span class="o">@</span><span class="n">Id</span> <span class="o">=</span> <span class="n">INSERTED</span><span class="p">.</span><span class="n">Id</span> <span class="k">FROM</span> <span class="n">INSERTED</span>
    <span class="k">SELECT</span> <span class="o">@</span><span class="k">LocalTimestamp</span> <span class="o">=</span> <span class="n">INSERTED</span><span class="p">.</span><span class="k">LocalTimestamp</span> <span class="k">FROM</span> <span class="n">INSERTED</span>
    <span class="k">SELECT</span> <span class="o">@</span><span class="n">TimeZone</span> <span class="o">=</span> <span class="n">INSERTED</span><span class="p">.</span><span class="n">TimeZoneId</span> <span class="k">FROM</span> <span class="n">INSERTED</span>

    <span class="k">UPDATE</span> <span class="p">[</span><span class="n">Events</span><span class="p">]</span>
    <span class="k">SET</span> <span class="p">[</span><span class="n">UtcTimestamp</span><span class="p">]</span> <span class="o">=</span> <span class="o">@</span><span class="k">LocalTimestamp</span> <span class="k">AT</span> <span class="nb">TIME</span> <span class="k">ZONE</span> <span class="o">@</span><span class="n">TimeZone</span> <span class="k">AT</span> <span class="nb">TIME</span> <span class="k">ZONE</span> <span class="s1">'UTC'</span>
    <span class="k">WHERE</span> <span class="n">Id</span> <span class="o">=</span> <span class="o">@</span><span class="n">Id</span>
<span class="k">END</span><span class="p">;</span>
</code></pre></div></div>

<p>If you’re using EF Core Migrations, you can use <a href="https://docs.microsoft.com/ef/core/managing-schemas/migrations/managing#adding-raw-sql">raw SQL to define this trigger</a>. Note that it’s now your responsibility to redo the conversions when the time zone database changes, just like with PostgreSQL above.</p>

<h2 id="some-closing-words">Some closing words</h2>

<p>It’s interesting to compare PostgreSQL and SQL Server on what is considered a non-deterministic function (and therefore, what can be used in a computed column). I sent a <a href="https://www.postgresql.org/message-id/CADT4RqDVBbqSbQVH_v_vS5_9DPhjsfmQw07E+q-ddR_XfZjffw@mail.gmail.com">message</a> about this to the PostgreSQL maintainers, and Tom Lane explained that if we’re absolutely strict, then even string comparison isn’t really deterministic, since it depends on collation rules which may also change. One could claim that if users need an auto-updating column that uses <code class="language-plaintext highlighter-rouge">AT TIME ZONE</code>, they’ll end up doing it with a trigger in any case, like we’ve done above for SQL Server; so we may as well make it easier and not disallow it in generated columns. It’s the user’s responsibility to take care of resyncing in any case.</p>

<p>Finally, if you think that converting a local date to UTC is simple - even when we know the time zone - then I encourage you to read the “Ambiguous and skipped times” section in Jon Skeet’s <a href="https://codeblog.jonskeet.uk/2019/03/27/storing-utc-is-not-a-silver-bullet/">post</a>. Timestamps are just so much fun.</p>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[When “UTC everywhere” isn’t enough]]></summary></entry><entry><title type="html">Mapping .NET Timestamps to PostgreSQL</title><link href="https://www.roji.org/postgresql-dotnet-timestamp-mapping" rel="alternate" type="text/html" title="Mapping .NET Timestamps to PostgreSQL" /><published>2021-10-10T00:00:00+02:00</published><updated>2021-10-10T00:00:00+02:00</updated><id>https://www.roji.org/postgresql-dotnet-timestamp-mapping</id><content type="html" xml:base="https://www.roji.org/postgresql-dotnet-timestamp-mapping"><![CDATA[<p><strong>INTERESTED IN TIMESTAMPS? SEE ALSO <a href="/storing-timezones-in-the-db">When “UTC everywhere” isn’t enough - storing time zones in PostgreSQL and SQL Server</a></strong></p>

<p>Npgsql 6.0 contains some significant changes to how timestamps are mapped between .NET and PostgreSQL - most applications will need to react to this (although a compatibility flag exists). This post gives the context for these changes, going over the timestamp types on both sides and the problems in mapping them.</p>

<h2 id="postgresql-timestamps">PostgreSQL timestamps</h2>

<p>Like with most things, PostgreSQL conforms to the SQL standard when it comes to timestamps (<a href="https://www.postgresql.org/docs/current/datatype-datetime.html">full docs</a>): it has a <code class="language-plaintext highlighter-rouge">timestamp without time zone</code> and a <code class="language-plaintext highlighter-rouge">timestamp with time zone</code> type (the shorter aliases are <code class="language-plaintext highlighter-rouge">timestamp</code> and <code class="language-plaintext highlighter-rouge">timestamptz</code>). <code class="language-plaintext highlighter-rouge">timestamptz</code> is perhaps the worst-named type in the world: it does <strong>not</strong>, in fact, store a time zone in the database, but rather a UTC timestamp; that causes lots of confusion from users expecting to persist a full timezone-aware timestamp to PostgreSQL. In this sense, <code class="language-plaintext highlighter-rouge">timestamptz</code> is different from the SQL Server <a href="https://docs.microsoft.com/sql/t-sql/data-types/datetimeoffset-transact-sql"><code class="language-plaintext highlighter-rouge">datetimeoffset</code></a> type (but see note below on why offsets may be a bad idea). What <code class="language-plaintext highlighter-rouge">timestamptz</code> <strong>is</strong> good for, is storing and interacting with UTC timestamps, or globally agreed-upon points in time, where the time zone does not matter. For example, when recording the time a transaction took place, you typically store a UTC timestamp, and then display it in the user’s local time zone, as reported by their web browser; this allows you to show the same timestamp to multiple users, each in their own time zone, and also to support the fact that users may be in one time zone today, and in another tomorrow. This is sometimes called doing “UTC everywhere”, and it tends to work well as a default pattern. In the relatively rarer cases where you need to store the time zone along with a timestamp, a separate column must be used alongside your timestamp column, typically holding a string representation of the timezone (e.g. <code class="language-plaintext highlighter-rouge">Europe/Berlin</code>)<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<p>The other type - <code class="language-plaintext highlighter-rouge">timestamp</code> - can be used to store a timestamp whose time zone is unknown, implicit or assumed to be local. It’s really important to understand that this does not represent a specific point in time unless coupled with some time zone: the same date/time combination corresponds to different universal instances in different time zones. PostgreSQL does have a <a href="https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-TIMEZONE"><code class="language-plaintext highlighter-rouge">TimeZone</code></a> connection state parameter, which defines the “local time zone” of the connection; it’s defined in your PostgreSQL configuration by default, and can be changed in your connection. When converting a <code class="language-plaintext highlighter-rouge">timestamp</code> into a <code class="language-plaintext highlighter-rouge">timestamptz</code> (remember: the latter means “UTC”), PostgreSQL will treat your <code class="language-plaintext highlighter-rouge">timestamp</code> as a local timestamp, and convert it to UTC based on the connection’s current <code class="language-plaintext highlighter-rouge">TimeZone</code>. However, fiddling around with your connection’s <code class="language-plaintext highlighter-rouge">TimeZone</code> and depending on your database to do timezone conversions usually isn’t a practical way to do things - you typically want to store and retrieve UTC timestamps from your database, and do any conversions to/from local timezones in your application, when interacting with users.</p>

<h2 id="net-timestamps">.NET timestamps</h2>

<p>The .NET situation around timestamps is… not pretty… .NET has some basic flaws in this area which have been with us since the beginning of time, and cannot be corrected without introducing unacceptable breaking changes. The .NET timestamp arsenal includes two main types: <a href="https://docs.microsoft.com/dotnet/api/system.datetime"><code class="language-plaintext highlighter-rouge">DateTime</code></a> and <a href="https://docs.microsoft.com/dotnet/api/system.datetimeoffset"><code class="language-plaintext highlighter-rouge">DateTimeOffset</code></a>.</p>

<p><code class="language-plaintext highlighter-rouge">DateTime</code> unsurprisingly contains a date and a time, but also a <a href="https://docs.microsoft.com/dotnet/api/system.datetimekind">Kind</a> property which can be <code class="language-plaintext highlighter-rouge">Utc</code>, <code class="language-plaintext highlighter-rouge">Local</code> or <code class="language-plaintext highlighter-rouge">Unspecified</code>: <code class="language-plaintext highlighter-rouge">Utc</code> is pretty self-explanatory, <code class="language-plaintext highlighter-rouge">Local</code> means a timestamp in the timezone of the machine where .NET is running, and <code class="language-plaintext highlighter-rouge">Unspecified</code> is, well, not very specified. One problematic aspect of DateTime is that these very different concepts are represented via the same .NET type: if a function accepts a DateTime, which Kind should you pass in? What happens when you compare a UTC DateTime with an Unspecified one? (The answer is that the timestamps will be compared disregarding the Kind, which I can’t imagine can produce fruitful results in any sane application). To know more about DateTime’s failings, see this <a href="https://blog.nodatime.org/2011/08/what-wrong-with-datetime-anyway.html">excellent blog post</a> by Jon Skeet.</p>

<p>DateTimeOffset is at least less ambiguous than DateTime: it’s a date and time, plus a timezone offset. Taken together, these identify a specific instant in time, and so a DateTimeOffset can always be unambiguously converted to a UTC timestamp, if needed. Its API still has some issues (see Jon’s post above), but in my opinion, the main problem with this type is that it gives the illusion of being timezone-aware without delivering on it. An offset (e.g. <code class="language-plaintext highlighter-rouge">UTC+01:00</code>) is <strong>not</strong> a timezone (e.g. IANA/Olson <code class="language-plaintext highlighter-rouge">Europe/Berlin</code>): timezones contain information about daylight saving time, which a simple offset does not; Berlin is sometimes at <code class="language-plaintext highlighter-rouge">UTC+01:00</code> and sometimes at <code class="language-plaintext highlighter-rouge">UTC+02:00</code>. This is especially important if you’re going to do arithmetic on a timestamp: if you add a few hours to a timestamp, an accurate result would have to take daylight savings into account. And if you’re not doing arithmetic, then you may not need the timezone in the first place (why not just use UTC?). The same criticism goes for the SQL Server <code class="language-plaintext highlighter-rouge">datetimeoffset</code> type: it makes you think you’re good, while neglecting daylight savings time.</p>

<p>Oh, and since I mentioned Jon Skeet above, you should absolute take a look at his <a href="https://nodatime.org/">NodaTime</a> library: this is how date/time types are done right. I’d recommend that any serious application that needs to deal with timestamps seriously consider using it, and Npgsql even fully supports it (both at the <a href="https://www.npgsql.org/doc/types/nodatime.html">ADO.NET</a> and <a href="https://www.npgsql.org/efcore/mapping/nodatime.html">EF Core</a> levels).</p>

<h2 id="mapping-net-to-postgresql">Mapping .NET to PostgreSQL</h2>

<p>One of the tasks of a database driver is to map two different type systems to one another; in our case, the .NET types must be mapped to the PostgreSQL ones. The mapping is sometimes simple (e.g. .NET <code class="language-plaintext highlighter-rouge">long</code> corresponds perfectly to a PostgreSQL <code class="language-plaintext highlighter-rouge">bigint</code>), but sometimes it’s quite complex. You guessed it: timestamps fall in the latter basket.</p>

<p>One curious thing with PostgreSQL <code class="language-plaintext highlighter-rouge">timestamptz</code>, is that while it’s stored as a UTC timestamp in the database, its textual representation is a local timestamp based on the <code class="language-plaintext highlighter-rouge">TimeZone</code> connection parameter: reading a <code class="language-plaintext highlighter-rouge">timestamptz</code> as text yields something like <code class="language-plaintext highlighter-rouge">2004-10-19 10:23:54+02</code>. Unfortunately this odd behavior shaped Npgsql’s original timestamp mapping in a significant way: reading a <code class="language-plaintext highlighter-rouge">timestamptz</code> returns a Local DateTime<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Among other things, this means you cannot round-trip a UTC DateTime: you can send it just fine, but when you read it back, you get a converted local timestamp. A similar thing was done with DateTimeOffset: Npgsql converted it to UTC before sending, and returned a DateTimeOffset in the machine’s time zone when reading (remember, no timezone or offset is actually stored in the database!): if I send a DateTimeOffset with offset <code class="language-plaintext highlighter-rouge">+02:00</code> on a machine configured with offset <code class="language-plaintext highlighter-rouge">+01:00</code>, it would be saved to UTC but read back with <code class="language-plaintext highlighter-rouge">+01:00</code>, with Npgsql doing all the conversions. This state of affairs led to a lot of general confusion, and made it quite difficult to support simple “UTC everywhere” programming, where you send a UTC timestamp to the database, and read it back in the same way.</p>

<p>In Npgsql 6.0, we redid the timestamp mapping with the following principles in mind:</p>

<ul>
  <li>1st-class support for the “UTC everywhere” pattern, and promote it as the default timestamp strategy.</li>
  <li>Cleanly separate between UTC timestamps and non-UTC timestamps as two different types, and disallow mixing them to protect against accidental errors.</li>
  <li>Values should always be round-trippable - whatever you send to PostgreSQL, you should get the same thing back. If we can’t roundtrip it, we should refuse to write it.</li>
  <li>Values should never undergo any implicit timezone conversions when being read or written. Any conversions should be done by the user, making them clear and explicit in the code.</li>
</ul>

<p>This means the following concrete things:</p>

<ul>
  <li>We now send UTC DateTime as <code class="language-plaintext highlighter-rouge">timestamptz</code>, and Local/Unspecified DateTime as <code class="language-plaintext highlighter-rouge">timestamp</code>; trying to send a non-UTC DateTime as <code class="language-plaintext highlighter-rouge">timestamptz</code> will throw an exception, etc. In effect, Npgsql is creating a strict type distinction between the different DateTime Kinds (which is how they should have been represented in the first place in .NET).<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></li>
  <li>We only allow sending a DateTimeOffset with offset 0 (as <code class="language-plaintext highlighter-rouge">timestamptz</code>): since the offset isn’t stored in the database, it can’t be round-tripped.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></li>
  <li>Reading a <code class="language-plaintext highlighter-rouge">timestamptz</code> will now yield a UTC DateTime or DateTimeOffset - no more implicit conversions.</li>
  <li>By nature, EF Core must make mapping decisions solely based on the type (DateTime) and cannot take the Kind into account, so we have to pick one type or the other as the default. Since “UTC everywhere” should generally be preferred, we now map DateTime to <code class="language-plaintext highlighter-rouge">timestamptz</code>.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></li>
  <li>Corresponding changes were done to the NodaTime mappings, though the situation is much simpler there, since the different concepts are represented by different .NET types (e.g. <a href="https://nodatime.org/3.0.x/userguide/concepts">Instant vs. LocalDateTime</a>).</li>
</ul>

<p>As you can imagine, the above implies a lot of breaking changes… This is not something we did lightly, but we do believe our users will end up in a better place. However, we’ve also provided a backwards compatibility flag which allows reverting to the previous behavior; <a href="https://www.npgsql.org/doc/types/datetime.html">see the documentation</a>.</p>

<p>Please let us know what you think! Don’t hesitate to open questions on the <a href="https://github.com/npgsql/npgsql">Npgsql</a> or <a href="https://github.com/npgsql/efcore.pg">EF Core provider</a> repos, or to ping me <a href="">on twitter</a>.</p>

<h2 id="appendix-what-postgresql-or-the-sql-standard-got-wrong">Appendix: what PostgreSQL (or the SQL standard) got wrong</h2>

<p>For those interested, here are a few thoughts on flaws in the PostgreSQL timestamp system - though the SQL standard is probably the one at fault here. There’s not much to be done about these, but it’s important to be aware of them.</p>

<ul>
  <li>The naming is quite atrocious:
    <ul>
      <li><code class="language-plaintext highlighter-rouge">timestamp with time zone</code> has a timezone in its textual representation, but not in its storage.</li>
      <li>This is also inconsistent with the type <code class="language-plaintext highlighter-rouge">time with time zone</code>, which <em>does</em> store an offset in the database.</li>
      <li>The <code class="language-plaintext highlighter-rouge">timestamp</code> name is bound to make people use it as the default, though it probably is not the thing most applications want.</li>
    </ul>
  </li>
  <li>PostgreSQL implicitly casts between <code class="language-plaintext highlighter-rouge">timestamp</code> and <code class="language-plaintext highlighter-rouge">timestamptz</code>, making it easy to accidentally get a timezone conversion; it would have been better to require explicit conversions instead. For example, the <a href="https://www.postgresql.org/docs/13/functions-datetime.html#FUNCTIONS-DATETIME-TABLE"><code class="language-plaintext highlighter-rouge">extract</code></a> function accepts a <code class="language-plaintext highlighter-rouge">timestamp</code>, so passing a <code class="language-plaintext highlighter-rouge">timestamptz</code> would cause an implicit timezone conversion.</li>
  <li>It’s arguably a bad idea for the <code class="language-plaintext highlighter-rouge">timestamptz</code> textual representation to contain a local representation.</li>
  <li><code class="language-plaintext highlighter-rouge">timestamp</code> is sometimes treated as a local timestamp (e.g. when converting it to <code class="language-plaintext highlighter-rouge">timestamptz</code>), but sometimes is simply an unspecified timestamp.</li>
  <li><code class="language-plaintext highlighter-rouge">time with time zone</code> makes no sense.</li>
</ul>

<p><strong>INTERESTED IN TIMESTAMPS? SEE ALSO <a href="/storing-timezones-in-the-db">When “UTC everywhere” isn’t enough - storing time zones in PostgreSQL and SQL Server</a></strong></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">

      <p>For more information on when “UTC Everywhere” is less appropriate and how to deal with it, see this great <a href="https://codeblog.jonskeet.uk/2019/03/27/storing-utc-is-not-a-silver-bullet/">post</a> by Jon Skeet. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">

      <p>Note that Npgsql did the timezone conversion based on the machine’s timezone, rather than based on the PostgreSQL <code class="language-plaintext highlighter-rouge">TimeZone</code>, so did not match the PostgreSQL behavior in any case. This was because .NET had no way of parsing the PostgreSQL IANA/Olson timezone IDs until <a href="https://devblogs.microsoft.com/dotnet/date-time-and-time-zone-enhancements-in-net-6/">.NET 6.0</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">

      <p>Incidentally, this is the only case in the Npgsql type mapping system where the PostgreSQL type depends not only on the CLR type (DateTime), but also on its value (the Kind). This isn’t trivial to do, and especially to do efficiently - thanks once again, DateTime! <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">

      <p>There are a few specific cases where we allow non-round-trippability. For one, PostgreSQL has only microsecond precision, whereas the .NET types have tick precision (100 nanoseconds); the driver silently truncates the extra precision rather than throwing. We also allow writing <code class="language-plaintext highlighter-rouge">default(DateTime)</code> as <code class="language-plaintext highlighter-rouge">timestamptz</code> even though its Kind is Unspecified. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[INTERESTED IN TIMESTAMPS? SEE ALSO When “UTC everywhere” isn’t enough - storing time zones in PostgreSQL and SQL Server]]></summary></entry><entry><title type="html">Query parameters, batching and SQL rewriting</title><link href="https://www.roji.org/parameters-batching-and-sql-rewriting" rel="alternate" type="text/html" title="Query parameters, batching and SQL rewriting" /><published>2021-08-17T00:00:00+02:00</published><updated>2021-08-17T00:00:00+02:00</updated><id>https://www.roji.org/parameters-batching-and-sql-rewriting</id><content type="html" xml:base="https://www.roji.org/parameters-batching-and-sql-rewriting"><![CDATA[<p>In the upcoming version 6.0 of the Npgsql PostgreSQL driver for .NET, we implemented what I think of as “raw mode” (<a href="https://github.com/npgsql/npgsql/pull/3852">#3852</a>). In a nutshell, this means that you can now use Npgsql without it doing anything to the SQL you provide it - it will simply send your queries as-is to PostgreSQL, without parsing or rewriting them in any way. Explaining what this means is a great opportunity to go into some interesting aspects of database programming - so let’s dive in.</p>

<h2 id="parameters">Parameters</h2>

<p>Parameters are important in database programming: instead of putting values directly into your SQL query, you integrate a placeholder which references a parameter value that’s delivered separately. This is important for preventing SQL injection attacks, but also helps performance through plan caching and prepared statements. Anybody who’s used .NET’s database API (ADO.NET) knows how parameters work:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">cmd</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">NpgsqlCommand</span><span class="p">(</span><span class="s">"SELECT * FROM employees WHERE first_name = @FirstName AND age = @Age"</span><span class="p">,</span> <span class="n">conn</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Parameters</span> <span class="p">=</span>
    <span class="p">{</span>
        <span class="k">new</span><span class="p">(</span><span class="s">"FirstName"</span><span class="p">,</span> <span class="s">"Shay"</span><span class="p">),</span>
        <span class="k">new</span><span class="p">(</span><span class="s">"Age"</span><span class="p">,</span> <span class="m">18</span><span class="p">)</span>
    <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>A command has a collection of parameters, and each parameter has a name and a value. Pretty straightforward… or is it?</p>

<p>It turns out that while some databases accept such name/value parameter pairs (e.g. SQL Server), PostgreSQL actually has a positional parameter system! Rather than the named parameter placeholders <code class="language-plaintext highlighter-rouge">@FirstName</code> and <code class="language-plaintext highlighter-rouge">@Age</code>, it expects to get <code class="language-plaintext highlighter-rouge">$1</code> and <code class="language-plaintext highlighter-rouge">$2</code>, which refer to positions in the parameter list. And indeed - there’s quite a zoo of parameter placeholders once you look around: Oracle does have named parameters like SQL Server, but uses a semicolon as the prefix (so <code class="language-plaintext highlighter-rouge">:Age</code> rather than <code class="language-plaintext highlighter-rouge">@Age</code>). In ODBC, parameter placeholders are simply question marks, which also bind positionally to parameters (this means that it’s impossible to refer to the same parameter twice without sending it twice as well).</p>

<p>What a mess. Now, the ADO.NET documentation calls all this out, <a href="https://docs.microsoft.com/dotnet/framework/data/adonet/configuring-parameters-and-parameter-data-types#working-with-parameter-placeholders">clearly stating</a> that parameter placeholders vary across data providers. In fact, if you look at the <a href="https://docs.microsoft.com/dotnet/api/system.data.common.dbparametercollection">DbParameterCollection</a> class, you’ll find a collection that is both named like a dictionary (for SQL Server, Oracle…) and ordered like an array (for ODBC, PostgreSQL). But at some prehistoric moment in Npgsql’s history, someone made the decision to support named parameters. This was probably done to make it easier to port applications from SQL Server to PostgreSQL, without having to change any SQL - not a bad idea. Unfortunately, that also means that Npgsql has to internally parse your SQL query and rewrite it to send the following to PostgreSQL:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">employees</span> <span class="k">WHERE</span> <span class="n">first_name</span> <span class="o">=</span> <span class="err">$</span><span class="mi">1</span> <span class="k">AND</span> <span class="n">age</span> <span class="o">=</span> <span class="err">$</span><span class="mi">2</span>
</code></pre></div></div>

<h2 id="batching">Batching</h2>

<p>As fascinating as the above mess is, let’s leave it for a second and concentrate on something else - statement batching. When you want to execute two unrelated SQL statements, it’s far more efficient to send both at the same time, and not wait for the first to complete before sending the second. In principle, any type of SQL statement can be batched in this way: an UPDATE and a DELETE, 5 different SELECTs, anything; if you’re not already batching where you could be, I highly recommend giving it a try.</p>

<p>The current way to perform batching with ADO.NET looks like this:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">cmd</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">NpgsqlCommand</span><span class="p">(</span><span class="s">"SELECT * FROM employees; SELECT * FROM departments"</span><span class="p">,</span> <span class="n">conn</span><span class="p">);</span>
</code></pre></div></div>

<p>You simply pack two SQL statements into a single command - separated by a semicolon - and execute that command as a single batch. Pretty straightforward… or is it?</p>

<p>The above works as-is on SQL Server, but the situation is a bit more complicated on PostgreSQL. PostgreSQL supports two protocols on the wire: the simple protocol and the extended protocol. The former does allow sending multiple statements as above, but has no support for parameters, prepared statements and various other features. At some point in the past, Npgsql actually used the simple protocol, and got around the lack of parameter support by interpolating parameter values directly into the SQL (client-side binding); this meant Npgsql needed to know how to generate (and parse) string representations of all supported data types, and that’s still inefficient due to the lack of real parameterization and prepared statements. Modern Npgsql exclusively uses the extended protocol, where each protocol message corresponds to exactly one SQL statement, with its own parameter list - no semicolons allowed.</p>

<p>So how does the above batching code work? You guessed it! Npgsql parses the SQL, locates the semicolons and breaks up the command’s text into multiple extended protocol messages.</p>

<h2 id="so-whats-the-big-deal">So what’s the big deal?</h2>

<p>We’ve seen two reasons why Npgsql has to mess around with your command’s SQL: to rewrite your named parameter placeholders into PostgreSQL native positional ones, and to break up any multiple statements for batching. But why should we care about all that?</p>

<ol>
  <li>Parsing the PostgreSQL SQL dialect isn’t trivial. For example, we must avoid manipulating string literals, which may contain semicolons or text that looks like placeholders. Of course, Npgsql doesn’t include a full SQL parser - that would be very hard to do - but rather a very small parser that knows the absolute minimum in order to perform its job. Now, we haven’t had any bugs recently, but I’m sure that if I really dove in there, I could produce cases where the parser mistakenly identifies a placeholder or semicolon where it shouldn’t, or vice versa. It’s an inherently unsafe situation.</li>
  <li>Beyond correctness, both parsing and producing the rewritten SQL is work, which can hurt performance. The longer the SQL query and the more parameters it has, the more overhead this process adds to query execution. Nobody wants that.</li>
  <li>When managing a parameter collection (e.g. <code class="language-plaintext highlighter-rouge">NpgsqlParameterCollection</code>), we have to maintain an internal dictionary that indexes parameters by their name. If we didn’t have to handle names, the collection would become a simple list - this is more efficient.</li>
  <li>Lastly and most importantly, I hate it. I believe a database driver’s job is to transmit the SQL users give it, without manipulating it in any way. Simple, easy, efficient, no frills.</li>
</ol>

<p>So what can be done about this?</p>

<h2 id="going-raw">Going raw</h2>

<p>The first step towards removing SQL manipulation is the introduction of a proper, 1st-class batching API: rather than packing multiple statements into a single semicolon-delimited string, a structured API would allow the user to manage multiple statements within a batch. The driver would then receive a batch which is already broken down into the various statements, and would no longer need to search for semicolons. The upcoming .NET 6.0 features <a href="https://github.com/dotnet/runtime/issues/28633">a new ADO.NET batching API</a> which does precisely this:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">batch</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">NpgsqlBatch</span><span class="p">(</span><span class="n">conn</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">BatchCommands</span> <span class="p">=</span>
    <span class="p">{</span>
        <span class="k">new</span><span class="p">(</span><span class="s">"SELECT * FROM employees"</span><span class="p">),</span>
        <span class="k">new</span><span class="p">(</span><span class="s">"SELECT * FROM departments"</span><span class="p">),</span>
    <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The question of parameter placeholders is a bit trickier. Starting with Npgsql 6.0, you can do the following:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">cmd</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">NpgsqlCommand</span><span class="p">(</span><span class="s">"SELECT * FROM employees WHERE first_name = $1 AND age = $2"</span><span class="p">,</span> <span class="n">conn</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Parameters</span> <span class="p">=</span>
    <span class="p">{</span>
        <span class="k">new</span><span class="p">()</span> <span class="p">{</span> <span class="n">Value</span> <span class="p">=</span> <span class="s">"Shay"</span> <span class="p">},</span>
        <span class="k">new</span><span class="p">()</span> <span class="p">{</span> <span class="n">Value</span> <span class="p">=</span> <span class="m">18</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Our parameters no longer have names! And since that’s the case (<a href="https://docs.microsoft.com/dotnet/api/system.data.common.dbparameter.parametername"><code class="language-plaintext highlighter-rouge">NpgsqlParameter.ParameterName</code></a> is null), Npgsql implicitly switches into “raw mode”, where it no longer performs any parsing or rewriting of your SQL. One consequence of this is that “legacy batching” - multiple semicolon-separated statements - is no longer supported; if you use positional parameters, you must also use the new batching API. If you use named parameters, Npgsql will continue behaving as before, rewriting your SQL in order to maintain full backwards compatibility.</p>

<p>Everything seems to be neatly taken care of - except for one small point. If your command has no parameters at all, Npgsql cannot be sure that there isn’t a semicolon hiding somewhere in your SQL, and must grudgingly fall back to parsing; not doing so would break backwards compatibility. So we added an AppContext switch which allows opting into raw mode everywhere, always:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">AppContext</span><span class="p">.</span><span class="nf">SetSwitch</span><span class="p">(</span><span class="s">"Npgsql.EnableSqlRewriting"</span><span class="p">,</span> <span class="k">false</span><span class="p">);</span>

<span class="kt">var</span> <span class="n">cmd</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">NpgsqlCommand</span><span class="p">(</span><span class="s">"SELECT * FROM employees"</span><span class="p">,</span> <span class="n">conn</span><span class="p">);</span>
</code></pre></div></div>

<p>Disabling rewriting will make your queries fail if they contain any named parameters or make use of semicolons for batching. Aside from optimizing the zero-parameters case, this switch can ensure that your application is always communicating in the safest and most efficient way with PostgreSQL.</p>

<h2 id="epilog">Epilog</h2>

<p>All the above is available in Npgsql as of 6.0.0-preview7. Unfortunately, positional parameters and the new batching API require changes in layers used over Npgsql: I’m not sure to what extent Dapper supports positional parameters, and EF Core requires some changes in order to support everything too; it’s unfortunately too late in the EF Core release cycle to make that happen, but I plan to work on that for EF Core 7.0.</p>

<p>One last point… The new .NET batching API wasn’t introduced just so that Npgsql could avoid parsing its SQL. While SQL Server does natively support multiple semicolon-separated statements in a single command (or “batch” in SQL Server parlance), there are some significant drawbacks for doing so - <a href="https://docs.microsoft.com/en-us/archive/blogs/dataaccess/does-ado-net-update-batching-really-do-something">read this old post for the details</a>. We also have good reason to believe that the MySQL provider can benefit from a better batching API as well - so  lots to look forward to.</p>

<p>Oh, and thanks to <a href="https://github.com/NinoFloris/">@NinoFloris</a> for some very helpful conversations on this!</p>

<p><strong>UPDATE 2022-05-09</strong>: Amazing timing… PostgreSQL 14 has introduced new syntax which breaks Npgsql’s SQL parsing logic, and will probably be non-trivial to recognize properly… see <a href="https://github.com/npgsql/npgsql/issues/4445">this issue</a>. This shows what a bad idea it is for a driver to be parsing SQL.</p>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[In the upcoming version 6.0 of the Npgsql PostgreSQL driver for .NET, we implemented what I think of as “raw mode” (#3852). In a nutshell, this means that you can now use Npgsql without it doing anything to the SQL you provide it - it will simply send your queries as-is to PostgreSQL, without parsing or rewriting them in any way. Explaining what this means is a great opportunity to go into some interesting aspects of database programming - so let’s dive in.]]></summary></entry><entry><title type="html">EF Core 6.7 Update Performance Improvements</title><link href="https://www.roji.org/efcore-7-update-perf" rel="alternate" type="text/html" title="EF Core 6.7 Update Performance Improvements" /><published>2021-07-12T00:00:00+02:00</published><updated>2021-07-12T00:00:00+02:00</updated><id>https://www.roji.org/efcore-7-update-perf</id><content type="html" xml:base="https://www.roji.org/efcore-7-update-perf"><![CDATA[<p>For the 7.0.0-preview6 release of Entity Framework Core, <a href="https://devblogs.microsoft.com/dotnet/announcing-ef7-preview6/">I wrote a blog post about the update pipline performance improvements introduced into EF Core 7</a>.</p>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[For the 7.0.0-preview6 release of Entity Framework Core, I wrote a blog post about the update pipline performance improvements introduced into EF Core 7.]]></summary></entry><entry><title type="html">EF Core 6.0 Performance Improvements</title><link href="https://www.roji.org/efcore-6-perf" rel="alternate" type="text/html" title="EF Core 6.0 Performance Improvements" /><published>2021-05-25T00:00:00+02:00</published><updated>2021-05-25T00:00:00+02:00</updated><id>https://www.roji.org/efcore-6-perf</id><content type="html" xml:base="https://www.roji.org/efcore-6-perf"><![CDATA[<p>For the 6.0.0-preview4 release of Entity Framework Core, <a href="https://devblogs.microsoft.com/dotnet/announcing-entity-framework-core-6-0-preview-4-performance-edition/">I wrote a blog post about the performance improvements introduced into EF Core 6</a>.</p>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[For the 6.0.0-preview4 release of Entity Framework Core, I wrote a blog post about the performance improvements introduced into EF Core 6.]]></summary></entry><entry><title type="html">The Curious Case of Commands and Cancellation</title><link href="https://www.roji.org/db-commands-and-cancellation" rel="alternate" type="text/html" title="The Curious Case of Commands and Cancellation" /><published>2020-10-15T00:00:00+02:00</published><updated>2020-10-15T00:00:00+02:00</updated><id>https://www.roji.org/db-commands-and-cancellation</id><content type="html" xml:base="https://www.roji.org/db-commands-and-cancellation"><![CDATA[<p>Async brought a world of goodness (and complexity) to .NET, including the concept of cancellation: since async operations are by their nature supposed to take a while, it makes sense to allow us to cancel mid-way and exit early. Like async in general, cancellation took some time to propagate everywhere - socket operations only started honoring cancellation token in <a href="https://github.com/dotnet/runtime/issues/23736">.NET Core 3.0</a>. For the upcoming 5.0 release of Npgsql, the PostgreSQL database driver, a lot of work is going on to provide a good command cancellation story (thanks <a href="https://github.com/vonzshik">@vonzshik</a>!!), and it is far more complicated than you’d think.</p>

<p><a href="https://docs.microsoft.com/dotnet/api/system.data.common.dbcommand">DbCommand</a> is the standard .NET type which represents a query you run against a database - you set SQL and parameters on it, invoke <a href="https://docs.microsoft.com/dotnet/api/system.data.common.dbcommand.executereaderasync">ExecuteReaderAsync</a>, and it gives you back a <a href="https://docs.microsoft.com/dotnet/api/system.data.common.dbdatareader">DbDataReader</a> which allows you to consume the results:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">await</span> <span class="k">using</span> <span class="nn">var</span> <span class="n">cmd</span> <span class="p">=</span> <span class="n">connection</span><span class="p">.</span><span class="nf">CreateCommand</span><span class="p">();</span>
<span class="n">cmd</span><span class="p">.</span><span class="n">CommandText</span> <span class="p">=</span> <span class="s">"SELECT something_from_the_database"</span><span class="p">;</span>

<span class="k">await</span> <span class="k">using</span> <span class="nn">var</span> <span class="n">reader</span> <span class="p">=</span> <span class="k">await</span> <span class="n">cmd</span><span class="p">.</span><span class="nf">ExecuteReaderAsync</span><span class="p">();</span>
<span class="c1">// The query has been started and is now running in the background</span>
<span class="c1">// Consume results via the reader</span>
</code></pre></div></div>

<p>Now, back in the old days - before async was even a thing - DbCommand already had a <a href="https://docs.microsoft.com/en-us/dotnet/api/system.data.common.dbcommand.cancel?view=netcore-3.1#System_Data_Common_DbCommand_Cancel">Cancel</a> method. This method attempts to cancel the ongoing execution of the query, on a best-effort basis, by doing whatever is appropriate for your database. When async came along, all the database types were retrofitted with async methods accepting cancellation tokens: <a href="https://docs.microsoft.com/dotnet/api/system.data.common.dbcommand.executereaderasync">DbCommand.ExecuteReaderAsync</a>, <a href="https://docs.microsoft.com/dotnet/api/system.data.common.dbdatareader.readasync">DbDataReader.ReadAsync</a>, etc. Logically, invoking the cancellation token is the async analog of calling the old DbCommand.Cancel - both semantically mean the same thing. Or so it would seem.</p>

<p>When you pass a cancellation token to some method, the general expectation is for the token to control that specific invocation; if you trigger the token, that invocation should terminate as early as possible and throw an <a href="https://docs.microsoft.com/dotnet/api/system.operationcanceledexception">OperationCanceledException</a>. The simplest example is <a href="https://docs.microsoft.com/dotnet/api/system.net.http.httpclient.getasync#System_Net_Http_HttpClient_GetAsync_System_String_">HttpClient.GetAsync</a>: that method call represents one potentially long process, and the cancellation token can abort that process; when the method completes, you know nothing lingers in the background. The database API, in contrast, is more complex: when DbCommand.ExecuteReaderAsync completes, the query has only just started, is (likely) still running, and may continue running for a very long time. The DbDataReader it returns allows you to start processing the result stream, possibly in parallel, while the database server is still running the query and sending results back.</p>

<p>So ExecuteReaderAsync starts some background process (the query), which doesn’t complete when the method itself completes - why is that significant? One question this raises, is how one goes about cancelling the query <em>after</em> ExecuteReaderAsync completes; the traditional DbCommand.Cancel API doesn’t have this problem, because it’s a method on the DbCommand, rather than a token you pass to some method call.</p>

<p>Another related question is what happens with the various methods on DbDataReader which also accept a cancellation token, such as ReadAsync: what should happen when the token for ReadAsync is triggered? The usual expectation in the async world is again, for the token to only cancel the method to which it was passed, i.e. ReadAsync; but if we do that, we’re left with no token-based means to cancel the query at all - which is a pretty important requirement. We could tell users who want to cancel the query to rig their cancellation tokens to call our old DbCommand.Cancel:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">async</span> <span class="n">Task</span> <span class="nf">ExecuteSomethingAsync</span><span class="p">(</span><span class="n">DbConnection</span> <span class="n">connection</span><span class="p">,</span> <span class="kt">string</span> <span class="n">sql</span><span class="p">,</span> <span class="n">CancellationToken</span> <span class="n">cancellationToken</span> <span class="p">=</span> <span class="k">default</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">await</span> <span class="k">using</span> <span class="nn">var</span> <span class="n">cmd</span> <span class="p">=</span> <span class="n">connection</span><span class="p">.</span><span class="nf">CreateCommand</span><span class="p">();</span>
    <span class="n">cmd</span><span class="p">.</span><span class="n">CommandText</span> <span class="p">=</span> <span class="n">sql</span><span class="p">;</span>

    <span class="k">await</span> <span class="k">using</span> <span class="nn">var</span> <span class="n">reader</span> <span class="p">=</span> <span class="k">await</span> <span class="n">cmd</span><span class="p">.</span><span class="nf">ExecuteReaderAsync</span><span class="p">(</span><span class="n">cancellationToken</span><span class="p">);</span>
    <span class="k">using</span> <span class="nn">var</span> <span class="n">registration</span> <span class="p">=</span> <span class="n">cancellationToken</span><span class="p">.</span><span class="nf">Register</span><span class="p">(()</span> <span class="p">=&gt;</span> <span class="n">cmd</span><span class="p">.</span><span class="nf">Cancel</span><span class="p">());</span>

    <span class="c1">// Process results</span>
<span class="p">}</span>
</code></pre></div></div>

<p>But this would have to be done everywhere where a potentially cancellable command needs to be executed, and isn’t very discoverable. Finally, it just doesn’t seem incredibly useful to allow ReadAsync to be cancelled while leaving the query itself running; it’s simply unlikely that a later retry would produce useful results where an earlier ReadAsync was cancelled. Yes, this business of “detached” async background processes which don’t correspond to method calls isn’t entirely trivial.</p>

<p>The solution we opted for was to treat the ReadAsync’s token - and indeed, all tokens accepted by methods on DbDataReader - in the same way as we treat ExecuteReaderAsync’s: triggering it cancels the query. This solves the typical user requirement: if a cancellation token comes from somewhere (rigged to some GUI button, for example), and is passed to all database methods, then the query gets cancelled if it gets triggered.</p>

<p>This does have one peculiar consequence. It is quite standard for async methods to start with the following:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">async</span> <span class="n">Task</span> <span class="nf">SomeLongThing</span><span class="p">(</span><span class="n">CancellationToken</span> <span class="n">cancellationToken</span> <span class="p">=</span> <span class="k">default</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">cancellationToken</span><span class="p">.</span><span class="nf">ThrowIfCancellationRequested</span><span class="p">();</span>

    <span class="c1">// ... actual stuff ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This performs an upfront check on the token, and immediately returns a cancelled Task if the method is invoked with an already-cancelled token. But in our case, calling the method with a cancelled token is actually the way to request cancellation of the query. It’s a bit odd, but it works and performs what people generally want.</p>

<p>Hate it? Love it? Let us know.</p>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[Async brought a world of goodness (and complexity) to .NET, including the concept of cancellation: since async operations are by their nature supposed to take a while, it makes sense to allow us to cancel mid-way and exit early. Like async in general, cancellation took some time to propagate everywhere - socket operations only started honoring cancellation token in .NET Core 3.0. For the upcoming 5.0 release of Npgsql, the PostgreSQL database driver, a lot of work is going on to provide a good command cancellation story (thanks @vonzshik!!), and it is far more complicated than you’d think.]]></summary></entry><entry><title type="html">C# 8 Nullable Reference Types, old TFMs and Multitargeting</title><link href="https://www.roji.org/nullable-reference-types-with-old-tfms" rel="alternate" type="text/html" title="C# 8 Nullable Reference Types, old TFMs and Multitargeting" /><published>2020-01-04T00:00:00+01:00</published><updated>2020-01-04T00:00:00+01:00</updated><id>https://www.roji.org/nullable-reference-types-with-old-tfms</id><content type="html" xml:base="https://www.roji.org/nullable-reference-types-with-old-tfms"><![CDATA[<p>C# 8.0 finally brought us nullable reference types (NRTs), which us to annotate our reference types as non-nullable and get compiler warnings for code that may be in violation. As libraries and applications in the .NET ecosystem opt into this feature, C# code will get safer and more self-documenting, as it’s immediately clear which variables can hold null and which can’t. <a href="https://docs.microsoft.com/en-ca/dotnet/csharp/nullable-references">Here are the C# docs for NRTs</a>, and you may also want to check out <a href="https://devblogs.microsoft.com/dotnet/try-out-nullable-reference-types/">this blog post to get started</a>.</p>

<p>There’s one problem though: C# 8.0 is only supported when targetting at least .NET Core 3.0 or .NET Standard 2.1, so if your project has to target an older TFM (say, .NET Standard 2.0 or even .NET Framework), you can’t officially use this feature. Unfortunately, some of us maintain software that can’t always target the newest shiny thing, but we’d still like to get the benefits of NRTs. No problem! As this is a compiler-only feature without any runtime requirements, there’s nothing <em>really</em> preventing you from turning it on in your csproj:</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="nt">&lt;PropertyGroup&gt;</span>
    <span class="nt">&lt;TargetFramework&gt;</span>netstandard2.0<span class="nt">&lt;/TargetFramework&gt;</span>
    <span class="nt">&lt;LangVersion&gt;</span>8.0<span class="nt">&lt;/LangVersion&gt;</span>
    <span class="nt">&lt;Nullable&gt;</span>enable<span class="nt">&lt;/Nullable&gt;</span>
  <span class="nt">&lt;/PropertyGroup&gt;</span>
</code></pre></div></div>

<p>And this will actually work! Until it doesn’t, that is… In some rare cases, things won’t work as they should when targeting the old TFM:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">string</span> <span class="nf">Foo</span><span class="p">(</span><span class="kt">string</span><span class="p">?</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">Debug</span><span class="p">.</span><span class="nf">Assert</span><span class="p">(</span><span class="n">s</span> <span class="p">!=</span> <span class="k">null</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">s</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Since <code class="language-plaintext highlighter-rouge">s</code> is a nullable string, using it in a non-nullable context will generate a warning. Now, let’s say that we know that in this particular context, <code class="language-plaintext highlighter-rouge">s</code> cannot be null, and wish to assert that. This code will compile just fine on recent TFMs, since the compiler knows that if <code class="language-plaintext highlighter-rouge">Debug.Assert</code> returns successfully, <code class="language-plaintext highlighter-rouge">s</code> can’t be null. However, when targeting an older TFM, this code will generate a warning. To be fair, this is a relatively rare corner case: most NRT code does compile correctly even on older BCLs.</p>

<p>A way around this is to target a newer TFM where NRTs are fully supported, and simply disable nullability on older one; in effect, we’ll be using the newer TFM to do the compiler verifications that our code is null-correct. So we can simply modify our csproj to do the following:</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="nt">&lt;PropertyGroup&gt;</span>
    <span class="nt">&lt;TargetFrameworks&gt;</span>netstandard2.0;netstandard2.1<span class="nt">&lt;/TargetFrameworks&gt;</span>
    <span class="nt">&lt;LangVersion&gt;</span>8.0<span class="nt">&lt;/LangVersion&gt;</span>
  <span class="nt">&lt;/PropertyGroup&gt;</span>

  <span class="nt">&lt;PropertyGroup</span> <span class="na">Condition=</span><span class="s">" '$(TargetFramework)' != 'netstandard2.0' "</span><span class="nt">&gt;</span>
    <span class="nt">&lt;Nullable&gt;</span>enable<span class="nt">&lt;/Nullable&gt;</span>
  <span class="nt">&lt;/PropertyGroup&gt;</span>
</code></pre></div></div>

<p>Great! There’s just one problem - our build will now generate warning CS8632 - The annotation for nullable reference types should only be used in code within a ‘#nullable’ annotations context - since in the older TFM we’re using the NRT feature without having it turned it on. No problem, we can just ignore that warning for the old TFM by adding the following:</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;PropertyGroup</span> <span class="na">Condition=</span><span class="s">" $(Nullable) != 'enable' "</span><span class="nt">&gt;</span>
  <span class="nt">&lt;NoWarn&gt;</span>$(NoWarn);CS8632<span class="nt">&lt;/NoWarn&gt;</span>
<span class="nt">&lt;/PropertyGroup&gt;</span>
</code></pre></div></div>

<p>That’s it. You now have a project targeting two TFMs, with NRTs enabled on the newer TFM and disable on the older. Happy fun nullificating your projects!</p>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[C# 8.0 finally brought us nullable reference types (NRTs), which us to annotate our reference types as non-nullable and get compiler warnings for code that may be in violation. As libraries and applications in the .NET ecosystem opt into this feature, C# code will get safer and more self-documenting, as it’s immediately clear which variables can hold null and which can’t. Here are the C# docs for NRTs, and you may also want to check out this blog post to get started.]]></summary></entry><entry><title type="html">Conceptual and API documentation with Docfx, Github Actions and Github Pages</title><link href="https://www.roji.org/docfx-with-github-actions" rel="alternate" type="text/html" title="Conceptual and API documentation with Docfx, Github Actions and Github Pages" /><published>2019-10-03T00:00:00+02:00</published><updated>2019-10-03T00:00:00+02:00</updated><id>https://www.roji.org/docfx-with-github-actions</id><content type="html" xml:base="https://www.roji.org/docfx-with-github-actions"><![CDATA[<p>A good software project is (among other things!) measured by the quality of its documentation, but setting up a good documentation workflow isn’t trivial. There are generally two kinds of documentation: conceptual articles, which are written manually (e.g. in markdown) and API documentation which is generated directly from source code. Docfx is a great tool which knows how to generate a single, seamless site from these two documentation types, but reaching doc nirvana is still hard:</p>

<ul>
  <li>The whole process should be fully automated: devs shouldn’t ever need to run docfx manually. We’re better than that. Let’s call this a continuous documentation pipeline.</li>
  <li>Conceptual docs are sometimes managed in a separate repo, raising the question of how to bring multiple docs together.</li>
  <li>The Npgsql case is even more complex: the same site has documentation for two projects (both the base Npgsql driver and the EF Core provider).</li>
</ul>

<p>The post will describe the new documentation pipeline used by Npgsql to solve these challenges, using <a href="https://dotnet.github.io/docfx/">docfx</a>, <a href="https://help.github.com/en/categories/automating-your-workflow-with-github-actions">Github Actions</a> for automation and <a href="https://pages.github.com/">Github Pages</a> for hosting. It assumes you’re (somewhat) familiar with docfx, and won’t go into the details of configuring it.</p>

<h2 id="automating-docfx-with-github-actions">Automating Docfx with Github Actions</h2>

<p>Let’s concentrate on conceptual documentation for now. For people hosting static (or Jekyll) sites on Github Pages, life is simple: edit a file and push, and your changes are automatically live. When a processing system such as docfx is used, we have to run docfx, and the resulting HTML site needs to be hosted somewhere. We can use Github Actions to do this for us.</p>

<p>Create a file called <code class="language-plaintext highlighter-rouge">.github/workflows/build-documentation.yml</code> in a repo containing some articles and a docfx.json:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">Build Documentation</span>

<span class="na">on</span><span class="pi">:</span>
  <span class="na">push</span><span class="pi">:</span>
    <span class="na">branches</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s">master</span>

<span class="na">jobs</span><span class="pi">:</span>
  <span class="na">build</span><span class="pi">:</span>

    <span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-18.04</span>

    <span class="na">steps</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Checkout repo</span>
      <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v1</span>
      <span class="na">with</span><span class="pi">:</span>
        <span class="na">path</span><span class="pi">:</span> <span class="s">docs</span>
        <span class="na">fetch-depth</span><span class="pi">:</span> <span class="m">1</span>

    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Get mono</span>
      <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
        <span class="s">apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF</span>
        <span class="s">echo "deb https://download.mono-project.com/repo/ubuntu stable-bionic main" | sudo tee /etc/apt/sources.list.d/mono-official-stable.list</span>
        <span class="s">sudo apt-get update</span>
        <span class="s">sudo apt-get install mono-complete --yes</span>

    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Get docfx</span>
      <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
        <span class="s">curl -L https://github.com/dotnet/docfx/releases/latest/download/docfx.zip -o docfx.zip</span>
        <span class="s">unzip -d .docfx docfx.zip</span>

    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Build docs</span>
      <span class="na">run</span><span class="pi">:</span>  <span class="s">mono .docfx/docfx.exe</span>
</code></pre></div></div>

<p>Let’s go over the above. We’ve created a workflow called “Build Documentation” that will run every time something is pushed to our repo’s master branch. It first clones our repo into a directory called docs, and to save time we fetch only 1 commit deep (who needs all that history). Now, since I’m a diehard Linux dude, we’ll be running on Ubuntu; this unfortunately means that we need to install mono, since docfx only runs on .NET Framework (guys, it’s 2019 and .NET Core 3.0 has just been released…). I won’t go into the technicalities of this step, and if you prefer Windows you can skip it entirely.</p>

<p>Once mono is installed, we get the latest version of docfx by fetching it from their Github releases page, and unzip it into some directory. At this point we’re ready to go, and can run docfx - hurray! Could it be this simple?</p>

<h2 id="repos-repos">Repos, repos…</h2>

<p>Well, uh, no… What are we going to do with all those HTML files that docfx generated? They need to be hosted somewhere. If you have some external hosting service, at this point you’d pack the outputs into a ZIP and send it off somewhere. But if you use Github Pages for your hosting, you may be tempted to host your site in the same repo which contains the sources. While this may seem like a good idea, it probably isn’t: to actually go live, you need to push a new commit containing these HTMLs; creating a new commit in your sources repo would mean you have to pull it the next time you want to make a change. But in any case, who wants a source repo to contain generated HTML artifacts - that’s like committing your compiled objects alongside your sources, yuck.</p>

<p>So we’ll open a new repo whose sole purpose is to host our static HTML files: this will be our publicly-visible repo. Our workflow will clone that repo, make sure that docfx generates its outputs into it’s directory, and finally commit and push the changes to it. Let’s add this additional fragment after our repo’s checkout:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Checkout live docs repo</span>
      <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v1</span>
      <span class="na">with</span><span class="pi">:</span>
        <span class="na">repository</span><span class="pi">:</span> <span class="s">npgsql/livedocs</span>
        <span class="na">ref</span><span class="pi">:</span> <span class="s">master</span>
        <span class="na">fetch-depth</span><span class="pi">:</span> <span class="m">1</span>
        <span class="na">path</span><span class="pi">:</span> <span class="s">docs/live</span>
    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Clear live docs repo</span>
      <span class="na">run</span><span class="pi">:</span> <span class="s">rm -rf live/*</span>
</code></pre></div></div>

<p>This time we need to specify which repo we want to clone, since it isn’t the repo where the workflow is running. We also specify to clone it into a <code class="language-plaintext highlighter-rouge">live</code> directory inside our sources repo; when docfx runs, it will automatically generate HTMLs into that directory. Once that’s done, all that’s left is to commit and push those changes to the live repo - but unfortunately that’s a bit complicated.</p>

<p>To push changes to another repo, we’re going to have to have an access token with the proper permission, so let’s head over to the site and generate one, with <code class="language-plaintext highlighter-rouge">repo</code> permissions, by following <a href="https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line">these instructions</a>. Now, I know we want to go fast, but we absolutely <em>cannot</em> insert that access token inside our workflow YAML: this is a public file, and putting our token there would give the world write access to our repo - not cool (don’t do this even for private repos!). Fortunately, Github Actions has a <a href="https://help.github.com/en/articles/virtual-environments-for-github-actions#creating-and-using-secrets-encrypted-variables">secrets management feature</a>, which allows you to store your access token and reference it safely from your YAML. Follow the instructions in that page to create a token called DOC_MGMT_TOKEN, and insert the following fragment at the bottom of your workflow:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Commit and push</span>
      <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
        <span class="s">cd live</span>
        <span class="s">git config --global user.email "noreply@npgsql.org"</span>
        <span class="s">git config --global user.name "Automated System"</span>
        <span class="s">git add .</span>
        <span class="s">git commit -m "Automated update" --author $GITHUB_ACTOR</span>
        <span class="s">header=$(echo -n "ad-m:$" | base64)</span>
        <span class="s">git -c http.extraheader="AUTHORIZATION: basic $header" push origin HEAD:master</span>
</code></pre></div></div>

<p>We unfortuately have to jump through some hoops - this should ideally be simpler. We:</p>

<ul>
  <li>Enter the live repo’s directory, where our HTMLs have been generated</li>
  <li>Configure our name and email with git, as these will appear in the commit we’re about to create</li>
  <li>Add all files</li>
  <li>Create the commit</li>
  <li>Git push the commit to the live doc, doing some magic to include our access token in the HTTP header for proper authentication</li>
</ul>

<p>… and we have a fully-working, automated documentation pipeline - just push any changes to see it appearing live! Now we’re done, right?</p>

<h2 id="api-documentation">API documentation</h2>

<p>I know… I promised we’d also do API documentation here. It’s not so hard after what we’ve already been through. After <a href="https://dotnet.github.io/docfx/tutorial/walkthrough/walkthrough_create_a_docfx_project_2.html">configuring your docfx.json</a> appropriately, if your (conceptual) doc repo is separate from your actual project(s), you will simply need to add workflow step to clone it into a directory where docfx will look for it:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Checkout Npgsql</span>
      <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v1</span>
      <span class="na">with</span><span class="pi">:</span>
        <span class="na">repository</span><span class="pi">:</span> <span class="s">npgsql/npgsql</span>
        <span class="na">ref</span><span class="pi">:</span> <span class="s">master</span>
        <span class="na">fetch-depth</span><span class="pi">:</span> <span class="m">1</span>
        <span class="na">path</span><span class="pi">:</span> <span class="s">docs/Npgsql</span>
</code></pre></div></div>

<p>Note that Npgsql follows <a href="https://datasift.github.io/gitflow/IntroducingGitFlow.html">Gitflow</a>, which means that the latest released version can always be found in the master branch - so that’s where we generate API docs from. Your git workflow may be different, adjust accordingly.</p>

<p>At this point, a doc rebuild is triggered whenever something is pushed to our <em>conceptual</em> repo, which is great, but we also want our project repo to trigger a rebuild! So we add another trigger at the beginning of our doc repo’s workflow file:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">on</span><span class="pi">:</span>
 <span class="na">repository_dispatch</span><span class="pi">:</span>
 <span class="na">push</span><span class="pi">:</span>
    <span class="na">branches</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s">master</span>
</code></pre></div></div>

<p>The added <a href="https://help.github.com/en/articles/events-that-trigger-workflows#external-events-repository_dispatch"><em>repository dispatch</em></a> is basically an event that can be triggered externally via a simple HTTP POST request. All that’s left is to drop the following workflow in our project repo, under <code class="language-plaintext highlighter-rouge">.github/workflows/trigger-doc-build.yml</code>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">Trigger Documentation Build</span>

<span class="na">on</span><span class="pi">:</span>
  <span class="na">push</span><span class="pi">:</span>
    <span class="na">branches</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s">master</span>

<span class="na">jobs</span><span class="pi">:</span>
  <span class="na">build</span><span class="pi">:</span>

    <span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-18.04</span>

    <span class="na">steps</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Trigger documentation build</span>
      <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
        <span class="s">curl -X POST \</span>
             <span class="s">-H "Authorization: token $" \</span>
             <span class="s">-H "Accept: application/vnd.github.everest-preview+json" \</span>
             <span class="s">-H "Content-Type: application/json" \</span>
             <span class="s">--data '{ "event_type": "Npgsql push to master" }' \</span>
             <span class="s">https://api.github.com/repos/npgsql/doc/dispatches</span>
</code></pre></div></div>

<p>Note that we also need to use an access token here as well, and to configure it on our <em>project</em> repo, since that is where the workflow runs. Once this is done, every push to your repo’s master branch will result in a doc rebuild (Npgsql even has two repos with this trigger to the same site).</p>

<h2 id="nirvana">Nirvana</h2>

<p>Once this is all properly set up, you hopefully never have to think about syncing docs ever again…! If you think the above is useful (or hate it), please drop me a comment below - any improvement suggestions would be welcome as well!</p>

<p>Oh, and here are the full files so you can see it all put together. Feel free to wander around <a href="https://github.com/npgsql/doc">the doc repo</a> or <a href="https://github.com/npgsql/npgsql">the Npgsql project repo</a> to see how it all fits together.</p>

<ul>
  <li><a href="/assets/2019-10-03-docfx-with-github-actions/build-documentation.yml">build-documentation.yml</a></li>
  <li><a href="/assets/2019-10-03-docfx-with-github-actions/trigger-doc-build.yml">trigger-doc-build.yml</a></li>
  <li><a href="/assets/2019-10-03-docfx-with-github-actions/docfx.json">docfx.json</a> (in case it floats your boat)</li>
</ul>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[A good software project is (among other things!) measured by the quality of its documentation, but setting up a good documentation workflow isn’t trivial. There are generally two kinds of documentation: conceptual articles, which are written manually (e.g. in markdown) and API documentation which is generated directly from source code. Docfx is a great tool which knows how to generate a single, seamless site from these two documentation types, but reaching doc nirvana is still hard:]]></summary></entry><entry><title type="html">EFCore 3.0 for PostgreSQL - Advanced JSON Support</title><link href="https://www.roji.org/efcore-pg-advanced-json" rel="alternate" type="text/html" title="EFCore 3.0 for PostgreSQL - Advanced JSON Support" /><published>2019-09-26T00:00:00+02:00</published><updated>2019-09-26T00:00:00+02:00</updated><id>https://www.roji.org/efcore-postgres-json-support</id><content type="html" xml:base="https://www.roji.org/efcore-pg-advanced-json"><![CDATA[<h1 id="json-and-databases">JSON and Databases</h1>

<p>Most relational databases have had some sort of native support for JSON for quite a while now; PostgreSQL introduced its first JSON support in version 9.2, back in 2012, and the more optimized <code class="language-plaintext highlighter-rouge">jsonb</code> type in 2014. JSON types have in part been the relational database response to the NoSQL movement, with its pervasive, schema-less JSON documents: look, we can do it too! But the marriage of a traditional relational schema with non-relational documents has proven very powerful indeed; complex data no longer have to be represented via sprawling, relational models involving endless joins, and islands of fluid, schema-less content within a stricter relational model brought some very welcome flexibility.</p>

<p>Database JSON support usually means that some operations on JSON data can be performed in the database; after all, simply storing and loading JSON documents is quite useless in itself. For JSON to really shine, we need to be able to ask for all JSON documents satisfying some condition - and to do it efficiently. At the most basic level, PostgreSQL supports the following:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">some_table</span> <span class="k">WHERE</span> <span class="n">customer</span><span class="o">-&gt;&gt;</span><span class="s1">'name'</span> <span class="o">==</span> <span class="s1">'Joe'</span><span class="p">;</span>
</code></pre></div></div>

<p>Assuming the <code class="language-plaintext highlighter-rouge">some_table</code> table has a JSON column named <code class="language-plaintext highlighter-rouge">customer</code>, this query will make PostgreSQL examine each row’s document, and return those rows where the <code class="language-plaintext highlighter-rouge">name</code> key is equal to “Joe”. <a href="https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING">Proper indexing</a> can make this perform very fast, and PostgreSQL has <a href="https://www.postgresql.org/docs/current/functions-json.html">a plethora of other operators and functions</a> that can be used to construct JSON queries.</p>

<p>Now, the syntax above is entirely PostgreSQL-specific: other databases have others ways to express queries. SQL/JSON standardization is underway, and PostgreSQL 12 <a href="https://paquier.xyz/postgresql-2/postgres-12-jsonpath/">will support jsonpath queries</a> which should finally provide a cross-database way to describe JSON queries. Unfortunately, the non-standardized nature of JSON support has meant that ORMs have often stayed away from it, and developers have been forced to drop down to raw SQL if they wanted to access JSON goodness.</p>

<p>No more! Release 3.0.0 of the Npgsql Entity Framework Core provider for PostgreSQL brings some exciting new JSON support, leveraging a unique feature of C#’s LINQ to express database JSON queries in a strongly-typed and natural way. The rest of this post will present the key new features, <a href="http://www.npgsql.org/efcore/mapping/json.html">consult the documentation for a more complete description</a>.</p>

<h1 id="strongly-typed-access-via-pocos">Strongly-typed access via POCOs</h1>

<p>Without further ado, you can now define an EF Core entity as follows:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">SomeEntity</span>   <span class="c1">// Maps to a database table</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="kt">int</span> <span class="n">Id</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="p">[</span><span class="nf">Column</span><span class="p">(</span><span class="n">TypeName</span> <span class="p">=</span> <span class="s">"jsonb"</span><span class="p">)]</span>
    <span class="k">public</span> <span class="n">Customer</span> <span class="n">Customer</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>

<span class="k">public</span> <span class="k">class</span> <span class="nc">Customer</span>    <span class="c1">// Maps to a JSON column in the table</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="kt">string</span> <span class="n">Name</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="kt">int</span> <span class="n">Age</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="n">Order</span><span class="p">[]</span> <span class="n">Orders</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>

<span class="k">public</span> <span class="k">class</span> <span class="nc">Order</span>       <span class="c1">// Part of the JSON column</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="kt">decimal</span> <span class="n">Price</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="kt">string</span> <span class="n">ShippingAddress</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Our <code class="language-plaintext highlighter-rouge">SomeEntity</code> type - which maps to a database table - contains an arbitrary user type (or POCO, plain-old-CLR-object), which is mapped to a PostgreSQL <code class="language-plaintext highlighter-rouge">jsonb</code> column via the <code class="language-plaintext highlighter-rouge">[Column]</code> data annotation attribute. This is really <em>all</em> you have to do, and everything will work as expected: Npgsql will use the new <a href="https://devblogs.microsoft.com/dotnet/try-the-new-system-text-json-apis/">System.Text.Json</a> to serialize and deserialize your instances to JSON data. Note also that our POCO, <code class="language-plaintext highlighter-rouge">Customer</code>, contains an array of another POCO, <code class="language-plaintext highlighter-rouge">Order</code>; this will also just work as expected, with the array of orders appearing inside the customer’s JSON document.</p>

<p>That’s it, couldn’t be simpler. No need for additional <code class="language-plaintext highlighter-rouge">customer</code> and <code class="language-plaintext highlighter-rouge">order</code> tables, with joins all around. But what about querying, as promised above? No problem:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">joes</span> <span class="p">=</span> <span class="n">context</span><span class="p">.</span><span class="n">CustomerEntries</span>
    <span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">e</span> <span class="p">=&gt;</span> <span class="n">e</span><span class="p">.</span><span class="n">Customer</span><span class="p">.</span><span class="n">Name</span> <span class="p">==</span> <span class="s">"Joe"</span><span class="p">)</span>
    <span class="p">.</span><span class="nf">ToList</span><span class="p">();</span>
</code></pre></div></div>

<p>This will produce the PostgreSQL-specific JSON syntax we saw above. Once again: we’re using natural C# and LINQ to express an SQL query over a JSON column in our database.</p>

<h1 id="weakly-typed-access-via-jsondocument">Weakly-typed access via JsonDocument</h1>

<p>Mapping POCOs is great when your JSON documents have a stable schema, but JSON is frequently used precisely when things are fluid: a document in one row could have a certain key which another document might not. A strongly-typed POCO is inappropriate for mapping in these circumstances, but never fear - there’s a solution for that as well. System.Text.Json also comes with a Document Object Model (DOM) for accessing JSON documents: you use types such as <a href="https://docs.microsoft.com/en-us/dotnet/api/system.text.json.jsondocument"><code class="language-plaintext highlighter-rouge">JsonDocument</code></a> and <a href="https://docs.microsoft.com/en-us/dotnet/api/system.text.json.jsonelement"><code class="language-plaintext highlighter-rouge">JsonElement</code></a> for weakly-typed access. These can also be mapped:</p>

<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">SomeEntity</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="kt">int</span> <span class="n">Id</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="n">JsonDocument</span> <span class="n">Customer</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>

<span class="kt">var</span> <span class="n">joes</span> <span class="p">=</span> <span class="n">context</span><span class="p">.</span><span class="n">CustomerEntries</span>
    <span class="p">.</span><span class="nf">Where</span><span class="p">(</span><span class="n">e</span> <span class="p">=&gt;</span> <span class="n">e</span><span class="p">.</span><span class="n">Customer</span><span class="p">.</span><span class="nf">GetProperty</span><span class="p">(</span><span class="s">"Name"</span><span class="p">).</span><span class="nf">GetString</span><span class="p">()</span> <span class="p">==</span> <span class="s">"Joe"</span><span class="p">)</span>
    <span class="p">.</span><span class="nf">ToList</span><span class="p">();</span>
</code></pre></div></div>

<p>This will produce the same SQL as above.</p>

<h1 id="closing-words">Closing Words</h1>

<p>This hopefully gave a good overview of this new JSON feature, which should make PostgreSQL JSON operations accessible to EF Core users - <a href="http://www.npgsql.org/efcore/mapping/json.html">the full documentation is available here</a>. Based on feedback, the plan is also to look into supporting JSON in other database providers, such as SQL Server or Sqlite; standardized SQL/JSON may provide an opportunity for generic, cross-database support.</p>

<p>Please send positive and negative feedback via twitter (<a href="https://twitter.com/shayrojansky">@shayrojansky</a>) or by opening an issue <a href="https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/">on the provider repo</a>. And have fun!</p>]]></content><author><name>Shay Rojansky</name></author><summary type="html"><![CDATA[JSON and Databases]]></summary></entry></feed>