如何使用 EF Core 在已翻译的 SQL 中获取 COUNT DISTINCT
How to get COUNT DISTINCT in translated SQL with EF Core
我想让 EF 核心将 .Select(x=>x.property).Distinct().Count()
翻译成类似
的东西
SELECT COUNT(DISTINCT property)
我们举个例子。假设我有一个带有 PersonID(long)、VisitStart(datetime2) 和 VisitEnd(datetime2) 的数据库 table。
如果我想获得某个特定人访问过的不同天数,那么我可以写 SQL like
SELECT COUNT(DISTINCT CONVERT(date, VisitStart)) FROM myTable GROUP BY PersonID
但是使用 EF 核心和这个
MyTable
.GroupBy(x=>x.PersonID)
.Select(x=> new
{
Count = x.Select(y=>y.VisitStart.Date).Distinct().Count()
})
给出正确的结果,转换成这个 SQL
SELECT [x].[PersonID], [x].[VisitStart], [x].[VisitEnd]
FROM [myTable] as [x]
ORDER BY [x].[PersonID]
任何地方都没有 GROUP BY 也没有 DISTINCT 或 COUNT,因此必须在内存中进行分组,这在 table 上操作时并不理想,因为 table 可能需要从中提取数百万条记录D B。
所以任何人都知道如何让 EF 核心将 .Select(...).Distinct().Count()
翻译成 SELECT COUNT(DISTINCT ...)
更新(EF Core 5.x):
从 5.0 版开始,表达式 Select(expr).Distinct().Count()
现在可以被 EF Core 识别并转换为相应的 SQL COUNT(DISTINCT expr))
,因此可以使用原始的 LINQ 查询 w/o修改.
原始(EF Core 2.x),由于查询管道重写,该解决方案不适用于 EF Core 3.x:
EF(6 和 Core)历来不支持此标准 SQL 构造。很可能是因为缺乏标准的 LINQ 方法和映射 Select(expr).Distinct().Count()
到它的技术困难。
好处是 EF Core 是可扩展的,通过用自定义派生实现替换它的许多内部服务来覆盖所需的行为。不容易,需要大量管道代码,但可行。
所以我的想法是添加和使用像这样的简单自定义 CountDistinct
方法
public static int CountDistinct<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> keySelector)
=> source.Select(keySelector).Distinct().Count();
public static int CountDistinct<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
=> source.Select(keySelector).Distinct().Count();
并让 EF Core 以某种方式将它们转换为 SQL。事实上,EF Core 提供了一种定义(甚至自定义翻译)数据库标量函数的简单方法,但不幸的是,这不能用于具有单独处理管道的聚合函数。所以我们需要深入挖掘 EF Core 基础设施。
最后提供了 EF Core 2.x 管道的完整代码。不确定是否值得努力,因为 EF Core 3.0 将使用完全重写的查询流程管道。但这很有趣,而且我很确定它可以针对新的(希望更简单的)管道进行更新。
无论如何,你只需要copy/paste将代码放入项目中的新代码文件中,在上下文中添加以下内容OnConfiguring
override
optionsBuilder.UseCustomExtensions();
这会将功能插入到 EF Core 基础结构中,然后像这样查询
var result = db.MyTable
.GroupBy(x => x.PersonID, x => new { VisitStartDate = x.VisitStart.Date })
.Select(g => new
{
Count = g.CountDistinct(x => x.VisitStartDate)
}).ToList();
将幸运地翻译成所需的
SELECT COUNT(DISTINCT(CONVERT(date, [x].[VisitStart]))) AS [Count]
FROM [MyTable] AS [x]
GROUP BY [x].[PersonID]
注意预选聚合方法所需的表达式。这是所有聚合方法的当前 EF Core limitation/requirement,而不仅仅是我们的。
最后,神奇的完整代码:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Internal;
using Microsoft.EntityFrameworkCore.Metadata;
using Microsoft.EntityFrameworkCore.Query;
using Microsoft.EntityFrameworkCore.Query.Expressions;
using Microsoft.EntityFrameworkCore.Query.ExpressionVisitors;
using Microsoft.EntityFrameworkCore.Query.ExpressionVisitors.Internal;
using Microsoft.EntityFrameworkCore.Query.Internal;
using Remotion.Linq;
using Remotion.Linq.Clauses;
using Remotion.Linq.Clauses.ResultOperators;
using Remotion.Linq.Clauses.StreamedData;
using Remotion.Linq.Parsing.Structure.IntermediateModel;
namespace Microsoft.EntityFrameworkCore
{
public static partial class CustomExtensions
{
public static int CountDistinct<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> keySelector)
=> source.Select(keySelector).Distinct().Count();
public static int CountDistinct<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
=> source.Select(keySelector).Distinct().Count();
public static DbContextOptionsBuilder UseCustomExtensions(this DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder
.ReplaceService<INodeTypeProviderFactory, CustomNodeTypeProviderFactory>()
.ReplaceService<IRelationalResultOperatorHandler, CustomRelationalResultOperatorHandler>();
}
}
namespace Remotion.Linq.Parsing.Structure.IntermediateModel
{
public sealed class CountDistinctExpressionNode : ResultOperatorExpressionNodeBase
{
public CountDistinctExpressionNode(MethodCallExpressionParseInfo parseInfo, LambdaExpression optionalSelector)
: base(parseInfo, null, optionalSelector) { }
public static IEnumerable<MethodInfo> GetSupportedMethods()
=> typeof(CustomExtensions).GetTypeInfo().GetDeclaredMethods("CountDistinct");
public override Expression Resolve(ParameterExpression inputParameter, Expression expressionToBeResolved, ClauseGenerationContext clauseGenerationContext)
=> throw CreateResolveNotSupportedException();
protected override ResultOperatorBase CreateResultOperator(ClauseGenerationContext clauseGenerationContext)
=> new CountDistinctResultOperator();
}
}
namespace Remotion.Linq.Clauses.ResultOperators
{
public sealed class CountDistinctResultOperator : ValueFromSequenceResultOperatorBase
{
public override ResultOperatorBase Clone(CloneContext cloneContext) => new CountDistinctResultOperator();
public override StreamedValue ExecuteInMemory<T>(StreamedSequence input) => throw new NotSupportedException();
public override IStreamedDataInfo GetOutputDataInfo(IStreamedDataInfo inputInfo) => new StreamedScalarValueInfo(typeof(int));
public override string ToString() => "CountDistinct()";
public override void TransformExpressions(Func<Expression, Expression> transformation) { }
}
}
namespace Microsoft.EntityFrameworkCore.Query.Internal
{
public class CustomNodeTypeProviderFactory : DefaultMethodInfoBasedNodeTypeRegistryFactory
{
public CustomNodeTypeProviderFactory()
=> RegisterMethods(CountDistinctExpressionNode.GetSupportedMethods(), typeof(CountDistinctExpressionNode));
}
public class CustomRelationalResultOperatorHandler : RelationalResultOperatorHandler
{
private static readonly ISet<Type> AggregateResultOperators = (ISet<Type>)
typeof(RequiresMaterializationExpressionVisitor).GetField("_aggregateResultOperators", BindingFlags.NonPublic | BindingFlags.Static)
.GetValue(null);
static CustomRelationalResultOperatorHandler()
=> AggregateResultOperators.Add(typeof(CountDistinctResultOperator));
public CustomRelationalResultOperatorHandler(IModel model, ISqlTranslatingExpressionVisitorFactory sqlTranslatingExpressionVisitorFactory, ISelectExpressionFactory selectExpressionFactory, IResultOperatorHandler resultOperatorHandler)
: base(model, sqlTranslatingExpressionVisitorFactory, selectExpressionFactory, resultOperatorHandler)
{ }
public override Expression HandleResultOperator(EntityQueryModelVisitor entityQueryModelVisitor, ResultOperatorBase resultOperator, QueryModel queryModel)
=> resultOperator is CountDistinctResultOperator ?
HandleCountDistinct(entityQueryModelVisitor, resultOperator, queryModel) :
base.HandleResultOperator(entityQueryModelVisitor, resultOperator, queryModel);
private Expression HandleCountDistinct(EntityQueryModelVisitor entityQueryModelVisitor, ResultOperatorBase resultOperator, QueryModel queryModel)
{
var queryModelVisitor = (RelationalQueryModelVisitor)entityQueryModelVisitor;
var selectExpression = queryModelVisitor.TryGetQuery(queryModel.MainFromClause);
var inputType = queryModel.SelectClause.Selector.Type;
if (CanEvalOnServer(queryModelVisitor)
&& selectExpression != null
&& selectExpression.Projection.Count == 1)
{
PrepareSelectExpressionForAggregate(selectExpression, queryModel);
var expression = selectExpression.Projection[0];
var subExpression = new SqlFunctionExpression(
"DISTINCT", inputType, new[] { expression.UnwrapAliasExpression() });
selectExpression.SetProjectionExpression(new SqlFunctionExpression(
"COUNT", typeof(int), new[] { subExpression }));
return new ResultTransformingExpressionVisitor<int>(
queryModelVisitor.QueryCompilationContext, false)
.Visit(queryModelVisitor.Expression);
}
else
{
queryModelVisitor.RequiresClientResultOperator = true;
var typeArgs = new[] { inputType };
var distinctCall = Expression.Call(
typeof(Enumerable), "Distinct", typeArgs,
queryModelVisitor.Expression);
return Expression.Call(
typeof(Enumerable), "Count", typeArgs,
distinctCall);
}
}
private static bool CanEvalOnServer(RelationalQueryModelVisitor queryModelVisitor) =>
!queryModelVisitor.RequiresClientEval && !queryModelVisitor.RequiresClientSelectMany &&
!queryModelVisitor.RequiresClientJoin && !queryModelVisitor.RequiresClientFilter &&
!queryModelVisitor.RequiresClientOrderBy && !queryModelVisitor.RequiresClientResultOperator &&
!queryModelVisitor.RequiresStreamingGroupResultOperator;
}
}
我想分享我的想法,以解决我关于不同计数的问题。
最终,另一种在分组中按函数进行不同计数的方法是通过函数嵌套分组(假设您可以通过以下方式聚合数据)。
这是我使用的一个例子,它似乎有效。
对于 criptic acronims 表示歉意,我正在使用它来使我的 JSON 尽可能小。
var myData = _context.ActivityItems
.GroupBy(a => new { ndt = EF.Property<DateTime>(a, "dt").Date, ntn = a.tn })
.Select(g => new
{
g.Key.ndt,
g.Key.ntn,
dpv = g.Sum(o => o.pv),
dlv = g.Sum(o => o.lv),
cnt = g.Count(),
})
.GroupBy(a => new { ntn = a.ntn })
.Select(g => new
{
g.Key.ntn,
sd = g.Min(o => o.ndt),
ld = g.Max(o => o.ndt),
pSum = g.Sum(o => o.dpv),
pMin = g.Min(o => o.dpv),
pMax = g.Max(o => o.dpv),
pAvg = g.Average(o => o.dpv),
lSum = g.Sum(o => o.dlv),
lMin = g.Min(o => o.dlv),
lMax = g.Max(o => o.dlv),
lAvg = g.Average(o => o.dlv),
n10s = g.Sum(o => o.cnt),
ndays = g.Count()
});
我想让 EF 核心将 .Select(x=>x.property).Distinct().Count()
翻译成类似
SELECT COUNT(DISTINCT property)
我们举个例子。假设我有一个带有 PersonID(long)、VisitStart(datetime2) 和 VisitEnd(datetime2) 的数据库 table。 如果我想获得某个特定人访问过的不同天数,那么我可以写 SQL like
SELECT COUNT(DISTINCT CONVERT(date, VisitStart)) FROM myTable GROUP BY PersonID
但是使用 EF 核心和这个
MyTable
.GroupBy(x=>x.PersonID)
.Select(x=> new
{
Count = x.Select(y=>y.VisitStart.Date).Distinct().Count()
})
给出正确的结果,转换成这个 SQL
SELECT [x].[PersonID], [x].[VisitStart], [x].[VisitEnd]
FROM [myTable] as [x]
ORDER BY [x].[PersonID]
任何地方都没有 GROUP BY 也没有 DISTINCT 或 COUNT,因此必须在内存中进行分组,这在 table 上操作时并不理想,因为 table 可能需要从中提取数百万条记录D B。
所以任何人都知道如何让 EF 核心将 .Select(...).Distinct().Count()
翻译成 SELECT COUNT(DISTINCT ...)
更新(EF Core 5.x):
从 5.0 版开始,表达式 Select(expr).Distinct().Count()
现在可以被 EF Core 识别并转换为相应的 SQL COUNT(DISTINCT expr))
,因此可以使用原始的 LINQ 查询 w/o修改.
原始(EF Core 2.x),由于查询管道重写,该解决方案不适用于 EF Core 3.x:
EF(6 和 Core)历来不支持此标准 SQL 构造。很可能是因为缺乏标准的 LINQ 方法和映射 Select(expr).Distinct().Count()
到它的技术困难。
好处是 EF Core 是可扩展的,通过用自定义派生实现替换它的许多内部服务来覆盖所需的行为。不容易,需要大量管道代码,但可行。
所以我的想法是添加和使用像这样的简单自定义 CountDistinct
方法
public static int CountDistinct<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> keySelector)
=> source.Select(keySelector).Distinct().Count();
public static int CountDistinct<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
=> source.Select(keySelector).Distinct().Count();
并让 EF Core 以某种方式将它们转换为 SQL。事实上,EF Core 提供了一种定义(甚至自定义翻译)数据库标量函数的简单方法,但不幸的是,这不能用于具有单独处理管道的聚合函数。所以我们需要深入挖掘 EF Core 基础设施。
最后提供了 EF Core 2.x 管道的完整代码。不确定是否值得努力,因为 EF Core 3.0 将使用完全重写的查询流程管道。但这很有趣,而且我很确定它可以针对新的(希望更简单的)管道进行更新。
无论如何,你只需要copy/paste将代码放入项目中的新代码文件中,在上下文中添加以下内容OnConfiguring
override
optionsBuilder.UseCustomExtensions();
这会将功能插入到 EF Core 基础结构中,然后像这样查询
var result = db.MyTable
.GroupBy(x => x.PersonID, x => new { VisitStartDate = x.VisitStart.Date })
.Select(g => new
{
Count = g.CountDistinct(x => x.VisitStartDate)
}).ToList();
将幸运地翻译成所需的
SELECT COUNT(DISTINCT(CONVERT(date, [x].[VisitStart]))) AS [Count]
FROM [MyTable] AS [x]
GROUP BY [x].[PersonID]
注意预选聚合方法所需的表达式。这是所有聚合方法的当前 EF Core limitation/requirement,而不仅仅是我们的。
最后,神奇的完整代码:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Internal;
using Microsoft.EntityFrameworkCore.Metadata;
using Microsoft.EntityFrameworkCore.Query;
using Microsoft.EntityFrameworkCore.Query.Expressions;
using Microsoft.EntityFrameworkCore.Query.ExpressionVisitors;
using Microsoft.EntityFrameworkCore.Query.ExpressionVisitors.Internal;
using Microsoft.EntityFrameworkCore.Query.Internal;
using Remotion.Linq;
using Remotion.Linq.Clauses;
using Remotion.Linq.Clauses.ResultOperators;
using Remotion.Linq.Clauses.StreamedData;
using Remotion.Linq.Parsing.Structure.IntermediateModel;
namespace Microsoft.EntityFrameworkCore
{
public static partial class CustomExtensions
{
public static int CountDistinct<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> keySelector)
=> source.Select(keySelector).Distinct().Count();
public static int CountDistinct<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
=> source.Select(keySelector).Distinct().Count();
public static DbContextOptionsBuilder UseCustomExtensions(this DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder
.ReplaceService<INodeTypeProviderFactory, CustomNodeTypeProviderFactory>()
.ReplaceService<IRelationalResultOperatorHandler, CustomRelationalResultOperatorHandler>();
}
}
namespace Remotion.Linq.Parsing.Structure.IntermediateModel
{
public sealed class CountDistinctExpressionNode : ResultOperatorExpressionNodeBase
{
public CountDistinctExpressionNode(MethodCallExpressionParseInfo parseInfo, LambdaExpression optionalSelector)
: base(parseInfo, null, optionalSelector) { }
public static IEnumerable<MethodInfo> GetSupportedMethods()
=> typeof(CustomExtensions).GetTypeInfo().GetDeclaredMethods("CountDistinct");
public override Expression Resolve(ParameterExpression inputParameter, Expression expressionToBeResolved, ClauseGenerationContext clauseGenerationContext)
=> throw CreateResolveNotSupportedException();
protected override ResultOperatorBase CreateResultOperator(ClauseGenerationContext clauseGenerationContext)
=> new CountDistinctResultOperator();
}
}
namespace Remotion.Linq.Clauses.ResultOperators
{
public sealed class CountDistinctResultOperator : ValueFromSequenceResultOperatorBase
{
public override ResultOperatorBase Clone(CloneContext cloneContext) => new CountDistinctResultOperator();
public override StreamedValue ExecuteInMemory<T>(StreamedSequence input) => throw new NotSupportedException();
public override IStreamedDataInfo GetOutputDataInfo(IStreamedDataInfo inputInfo) => new StreamedScalarValueInfo(typeof(int));
public override string ToString() => "CountDistinct()";
public override void TransformExpressions(Func<Expression, Expression> transformation) { }
}
}
namespace Microsoft.EntityFrameworkCore.Query.Internal
{
public class CustomNodeTypeProviderFactory : DefaultMethodInfoBasedNodeTypeRegistryFactory
{
public CustomNodeTypeProviderFactory()
=> RegisterMethods(CountDistinctExpressionNode.GetSupportedMethods(), typeof(CountDistinctExpressionNode));
}
public class CustomRelationalResultOperatorHandler : RelationalResultOperatorHandler
{
private static readonly ISet<Type> AggregateResultOperators = (ISet<Type>)
typeof(RequiresMaterializationExpressionVisitor).GetField("_aggregateResultOperators", BindingFlags.NonPublic | BindingFlags.Static)
.GetValue(null);
static CustomRelationalResultOperatorHandler()
=> AggregateResultOperators.Add(typeof(CountDistinctResultOperator));
public CustomRelationalResultOperatorHandler(IModel model, ISqlTranslatingExpressionVisitorFactory sqlTranslatingExpressionVisitorFactory, ISelectExpressionFactory selectExpressionFactory, IResultOperatorHandler resultOperatorHandler)
: base(model, sqlTranslatingExpressionVisitorFactory, selectExpressionFactory, resultOperatorHandler)
{ }
public override Expression HandleResultOperator(EntityQueryModelVisitor entityQueryModelVisitor, ResultOperatorBase resultOperator, QueryModel queryModel)
=> resultOperator is CountDistinctResultOperator ?
HandleCountDistinct(entityQueryModelVisitor, resultOperator, queryModel) :
base.HandleResultOperator(entityQueryModelVisitor, resultOperator, queryModel);
private Expression HandleCountDistinct(EntityQueryModelVisitor entityQueryModelVisitor, ResultOperatorBase resultOperator, QueryModel queryModel)
{
var queryModelVisitor = (RelationalQueryModelVisitor)entityQueryModelVisitor;
var selectExpression = queryModelVisitor.TryGetQuery(queryModel.MainFromClause);
var inputType = queryModel.SelectClause.Selector.Type;
if (CanEvalOnServer(queryModelVisitor)
&& selectExpression != null
&& selectExpression.Projection.Count == 1)
{
PrepareSelectExpressionForAggregate(selectExpression, queryModel);
var expression = selectExpression.Projection[0];
var subExpression = new SqlFunctionExpression(
"DISTINCT", inputType, new[] { expression.UnwrapAliasExpression() });
selectExpression.SetProjectionExpression(new SqlFunctionExpression(
"COUNT", typeof(int), new[] { subExpression }));
return new ResultTransformingExpressionVisitor<int>(
queryModelVisitor.QueryCompilationContext, false)
.Visit(queryModelVisitor.Expression);
}
else
{
queryModelVisitor.RequiresClientResultOperator = true;
var typeArgs = new[] { inputType };
var distinctCall = Expression.Call(
typeof(Enumerable), "Distinct", typeArgs,
queryModelVisitor.Expression);
return Expression.Call(
typeof(Enumerable), "Count", typeArgs,
distinctCall);
}
}
private static bool CanEvalOnServer(RelationalQueryModelVisitor queryModelVisitor) =>
!queryModelVisitor.RequiresClientEval && !queryModelVisitor.RequiresClientSelectMany &&
!queryModelVisitor.RequiresClientJoin && !queryModelVisitor.RequiresClientFilter &&
!queryModelVisitor.RequiresClientOrderBy && !queryModelVisitor.RequiresClientResultOperator &&
!queryModelVisitor.RequiresStreamingGroupResultOperator;
}
}
我想分享我的想法,以解决我关于不同计数的问题。
最终,另一种在分组中按函数进行不同计数的方法是通过函数嵌套分组(假设您可以通过以下方式聚合数据)。
这是我使用的一个例子,它似乎有效。
对于 criptic acronims 表示歉意,我正在使用它来使我的 JSON 尽可能小。
var myData = _context.ActivityItems
.GroupBy(a => new { ndt = EF.Property<DateTime>(a, "dt").Date, ntn = a.tn })
.Select(g => new
{
g.Key.ndt,
g.Key.ntn,
dpv = g.Sum(o => o.pv),
dlv = g.Sum(o => o.lv),
cnt = g.Count(),
})
.GroupBy(a => new { ntn = a.ntn })
.Select(g => new
{
g.Key.ntn,
sd = g.Min(o => o.ndt),
ld = g.Max(o => o.ndt),
pSum = g.Sum(o => o.dpv),
pMin = g.Min(o => o.dpv),
pMax = g.Max(o => o.dpv),
pAvg = g.Average(o => o.dpv),
lSum = g.Sum(o => o.dlv),
lMin = g.Min(o => o.dlv),
lMax = g.Max(o => o.dlv),
lAvg = g.Average(o => o.dlv),
n10s = g.Sum(o => o.cnt),
ndays = g.Count()
});