Avoiding SQL Server plan cache pollution due to unparameterized Contains queries generated by EF Core
In this post I will describe an approach to solving an issue with EF Core and queries using Contains
I have previously blogged about:
The SQL Server plan cache is a limited resource, and if SQL statements are not parameterized, you can end up with so many plans in the query cache, that on a busy system the database engine will spend all it's time evicting plans from the cache, instead of delivering query results!
Previous solution
I also propose a solution to the issue in my previous blog post, using some LINQ tricks
The advantage of the previous approach is that it works on all supported editions of SQL Server (and indeed with any other provider), but is is also restricted by the fact that SQL Server does not support more than 2100 parameters per call.
And despite bucketizing, the previous approach will generate a number of query plans.
Alternate solution
Below, I will demonstrate an alternative approach, using the STRING_SPLIT
table valued function (TVF) which was introduced in SQL Server 2016. So this solution is for SQL Server 2016 and later (including Azure SQL Database).
I will take advantage of the ability to map to a SQL Server TVF introduced in EF Core 5, so no raw SQL statements using the FromSql method are required.
Sample code
Let's see how to put the parts together.
First add a class to hold the return value from the STRING_SPLIT function.
public class StringSplitResult
{
public string Value { get; set; }
}
Then extend your DbContext with the following code.
public partial class NorthwindContext
{
partial void OnModelCreatingPartial(ModelBuilder modelBuilder)
{
modelBuilder.Entity<StringSplitResult>().HasNoKey();
}
[DbFunction(IsBuiltIn = true, Name = "STRING_SPLIT")]
private IQueryable<StringSplitResult> Split(string source, string separator)
=> FromExpression(() => Split(source, separator));
public IQueryable<string> AsSplit(string[] source)
=> Split(string.Join(",", source), ",").Select(s => s.Value);
}
First, a keyless entity is added to the DbContext.
Then the mapping for the built-in STRING_SPLIT
TVF is added, using the EF Core DbFunction
attribute.
Finally, a convenience method is added to support passing an array of strings.
You can also add a similar convenience method for an array of numbers.
public IQueryable<long> AsSplit(long[] source)
=> Split(string.Join(",", source.Select(x => Convert.ToString(x))), ",").Select(s => Convert.ToInt64(s.Value));
Let's test the code and inspect the generated SQL!
using (var db = new NorthwindContext())
{
var list = new[] { "ALFKI", "BERGS", "VAFFE" };
var result = db.Orders
.Where(s => db.AsSplit(list).Contains( s.CustomerId))
.Select(o => new { o.OrderDate, o.CustomerId })
.ToList();
foreach (var item in result)
{
Console.WriteLine($"{item.CustomerId} : {item.OrderDate}");
}
}
This generates the following, properly parameterized SQL:
DECLARE @__source_1 nvarchar(4000) = N'ALFKI,BERGS,VAFFE';
DECLARE @__p_2 nvarchar(4000) = N',';
SELECT [o].[OrderDate], [o].[CustomerID] AS [CustomerId]
FROM [Orders] AS [o]
WHERE EXISTS (
SELECT 1
FROM [STRING_SPLIT](@__source_1, @__p_2) AS [s]
WHERE ([s].[Value] = [o].[CustomerID]) OR ([s].[Value] IS NULL AND [o].[CustomerID] IS NULL))
And as we can see from the query plan, SQL Server is able to use an existing index on CustomerID:
Thanks to stevendarby and smitpatel for inspiration to this solution!
Happy coding!