SQL - 加入存档表时避免完全扫描

SQL - Avoid full scans when joining archive tables

我遇到了一些性能问题,因为在一些较大的 table 上 运行 进行全面扫描以获取报告。我已将范围缩小到查询的这一部分,但无法弄清楚如何在不更改结果的情况下避免扫描。

解释一下,我们有一个数据归档系统,每天将数据从实时 table 复制到归档 table。在一段时间过去之前,数据不会从实时 table 中删除。这会导致实时 table 和存档 table 都具有相同的行,但行中的数据可能不匹配的状态。

这排除了 UNION 查询(​​这将消除完整扫描)。要求是报告显示实时数据,所以我也不能只查询存档 table.

有什么想法吗?这是查询。两个 table 的主键都是 DetailIdent,但我在 OrderIdent 上有一个索引,因为它是返回父 table 的外键。您可以看到,如果存在主要 table 结果,我们将取主结果,否则我们将回退到存档数据。

SELECT COALESCE(RegOD.OrderIdent, ArcOD.OrderIdent) AS OrderIdent,
                   COALESCE(RegOD.Quantity, ArcOD.Quantity) AS Quantity,
                   COALESCE(RegOD.LoadQuan, ArcOD.LoadQuan) AS LoadQuan,
                   COALESCE(RegOD.ShipQuan, ArcOD.ShipQuan) AS ShipQuan,
                   COALESCE(RegOD.RcvdQuan, ArcOD.RcvdQuan) AS RcvdQuan,
                   COALESCE(RegOD.UOM, ArcOD.UOM) AS UOM,
                   COALESCE(RegOD.SkidType, ArcOD.SkidType) AS SkidType,
                   COALESCE(RegOD.Product, ArcOD.Product) AS Product,
                   COALESCE(RegOD.PkgCode, ArcOD.PkgCode) AS PkgCode
            FROM OrderDetail RegOD
                FULL JOIN dbo.ArcOrderDtl ArcOD
                    ON ArcOD.DetailIdent = RegOD.DetailIdent
                    WHERE COALESCE(RegOD.OrderIdent, ArcOD.OrderIdent) = 717010

过滤谓词 COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) = 717010 正在降低性能,它会强制引擎先执行完整扫描,然后再过滤数据。

选项 1 - 重新表述 COALESCE() 函数

改写 COALESCE() 函数,让引擎完成它的工作。运气好的话,引擎会足够聪明,可以找到优化。在这种情况下,查询可以采用以下形式:

SELECT
  COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) AS OrderIdent,
  COALESCE(RegOD.Quantity,ArcOD.Quantity) AS Quantity,
  COALESCE(RegOD.LoadQuan,ArcOD.LoadQuan) AS LoadQuan,
  COALESCE(RegOD.ShipQuan,ArcOD.ShipQuan) AS ShipQuan,
  COALESCE(RegOD.RcvdQuan,ArcOD.RcvdQuan) AS RcvdQuan,
  COALESCE(RegOD.UOM,ArcOD.UOM) AS UOM,
  COALESCE(RegOD.SkidType,ArcOD.SkidType) AS SkidType,
  COALESCE(RegOD.Product,ArcOD.Product) AS Product,
  COALESCE(RegOD.PkgCode,ArcOD.PkgCode) AS PkgCode
FROM OrderDetail RegOD 
FULL JOIN dbo.ArcOrderDtl ArcOD ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE RegOD.OrderIdent = 717010 or ArcOD.OrderIdent = 717010

选项 2 - 将左连接与右连接组合 anti-join 而不是使用完整连接

如果引擎没有优化上面的选项 #1,您仍然可以尝试将左连接与右连接组合 anti-join 而不是编写完整连接(它们是等效的)。它肯定更冗长,但在这种情况下,它清楚地显示了引擎要做什么。此查询可能如下所示:

SELECT -- left join here
  COALESCE(RegOD.OrderIdent,ArcOD.OrderIdent) AS OrderIdent,
  COALESCE(RegOD.Quantity,ArcOD.Quantity) AS Quantity,
  COALESCE(RegOD.LoadQuan,ArcOD.LoadQuan) AS LoadQuan,
  COALESCE(RegOD.ShipQuan,ArcOD.ShipQuan) AS ShipQuan,
  COALESCE(RegOD.RcvdQuan,ArcOD.RcvdQuan) AS RcvdQuan,
  COALESCE(RegOD.UOM,ArcOD.UOM) AS UOM,
  COALESCE(RegOD.SkidType,ArcOD.SkidType) AS SkidType,
  COALESCE(RegOD.Product,ArcOD.Product) AS Product,
  COALESCE(RegOD.PkgCode,ArcOD.PkgCode) AS PkgCode
FROM OrderDetail RegOD 
LEFT JOIN dbo.ArcOrderDtl ArcOD ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE RegOD.OrderIdent = 717010
UNION ALL
SELECT -- right anti-join here
  OrderIdent,
  Quantity,
  LoadQuan,
  ShipQuan,
  RcvdQuan,
  UOM,
  SkidType,
  Product,
  PkgCode
FROM dbo.ArcOrderDtl ArcOD
LEFT JOIN OrderDetail RegOD ON ArcOD.DetailIdent = RegOD.DetailIdent
WHERE ArcOD.OrderIdent = 717010 and RegOD.DetailIdent IS NULL

您想要一个 OrderIdent 的所有行,但行(由 DetailIdent 标识)可以在 OrderDetail 或 ArcOrderDtl 或两者中。您希望优先处理 OrderDetail 行(如果存在)。

因此,一个想法是 select 所有行然后对它们进行排名,使 OrderDetail 的排名比 ArcOrderDtl 更好,然后使用 TOP WITH TIES 获得所有排名更好的行并忽略其他行。

SELECT TOP(1) WITH TIES
  OrderIdent, Quantity, LoadQuan, ShipQuan, RcvdQuan, UOM, SkidType, Product, PkgCode
FROM
(
  SELECT 
    DetailIdent, OrderIdent, Quantity, LoadQuan, ShipQuan, RcvdQuan, UOM, SkidType,
    Product, PkgCode, 1 AS priority
  FROM OrderDetail
  WHERE OrderIdent = 717010
  UNION ALL
  SELECT 
    DetailIdent, OrderIdent, Quantity, LoadQuan, ShipQuan, RcvdQuan, UOM, SkidType,
    Product, PkgCode, 2 AS priority
  FROM dbo.ArcOrderDtl
  WHERE OrderIdent = 717010
) unioned
ORDER BY RANK() (PARTITION BY DetailIdent ORDER BY priority);

我假设 table 共享主键 OrderIdentDetailIdent(或者至少是这些字段的唯一索引)。如果是这样,首先找出存档 table 中不在实时 table 中的所有密钥,然后从两个 table 中获取我们感兴趣的密钥。

您将有效地 运行 两次 table,但索引(和缓存)将使速度足够快并且操作非常简单。

SELECT DetailIdent
  INTO #archiveRows
  FROM ArcOrderDtl ArcOD
 WHERE OrderIdent = 717010
 
 EXCEPT 

SELECT DetailIdent
  FROM OrderDetail RegOD
 WHERE OrderIdent = 717010
 
CREATE UNIQUE CLUSTERED INDEX uq0_archiveRows ON #archiveRows (DetailIdent) WITH (FILLFACTOR = 100)

SELECT -- live
  OrderIdent,
  Quantity,
  LoadQuan,
  ShipQuan,
  RcvdQuan,
  UOM,
  SkidType,
  Product,
  PkgCode,
FROM OrderDetail RegOD 
WHERE RegOD.OrderIdent = 717010

UNION ALL

SELECT -- archive
  OrderIdent,
  Quantity,
  LoadQuan,
  ShipQuan,
  RcvdQuan,
  UOM,
  SkidType,
  Product,
  PkgCode
FROM dbo.ArcOrderDtl ArcOD
JOIN #archiveRows t
  ON t.DetailIdent = ArcOD.DetailIdent
WHERE ArcOD.OrderIdent = 717010

PS:如果您出于某种原因不能使用 temp-tables,我想您可以将其放在 CTE 中;我猜返回的实际行数很少,应该也能正常工作。 (我主要倾向于 'promote' temp-tables 因为它们易于阅读,可以被索引并且优化器将在它们上创建统计信息并将其用于下一步!)