SQL 服务器存储过程在进行微小更改后运行花费了很长时间

Question

我有一个存储过程，它从我创建的两个 table 中获取信息以生成一个摘要 table，然后将其用于多个视图。

以前运行需要 60-90 秒。我有两次调用不同成本的函数，第三次调用成本 * 数量。我删除了所有 3 个并替换为一个几乎与其他成本函数之一完全相同的新函数

我在研究它时写了这篇文章，所以它有所改进。我提高了速度，但仍然没有以前那么快，我不确定为什么。

ALTER FUNCTION [dbo].[fn_getFactoryStdCost]
    (@PartID int)
RETURNS decimal(20, 4)
AS
BEGIN
    DECLARE @pureID int = 0

    SET @pureID = (SELECT TOP(1) PURE_COST_ID 
                   FROM visuser.PART_COST 
                   WHERE EN_PART_ID = @partID 
                   ORDER BY EN_REV_MASTER_ID DESC, IC_WAREHOUSE_ID DESC)

    RETURN (SELECT TOP(1) (TOT_MATERIAL_N + TOT_MATERIAL_OVERHEAD_N) 
            FROM visuser.PURE_COST 
            WHERE PURE_COST_ID = @pureID 
            ORDER BY (TOT_MATERIAL_N + TOT_MATERIAL_OVERHEAD_N) DESC) 
END

替换为。我在它第一次卡住后添加了 WITH INLINE = OFF 来排除这种情况。该功能本身就可以正常工作。

ALTER FUNCTION [dbo].[fn_getFactoryStdCost] 
    (@PartID int)
RETURNS decimal(20,4)
WITH INLINE = OFF
AS
BEGIN
    DECLARE @pureID int = 0

    SET @pureID = (SELECT TOP(1) PURE_COST_ID 
                   FROM visuser.PART_COST 
                   WHERE EN_PART_ID = @partID 
                   ORDER BY EN_REV_MASTER_ID DESC, IC_WAREHOUSE_ID DESC)

    RETURN (SELECT TOP(1) (TOT_MATERIAL_N + TOT_MATERIAL_OVERHEAD_N + TOT_RUN_VALUE_N + TOT_FIXED_OVERHEAD_N) FROM visuser.PURE_COST WHERE PURE_COST_ID = @pureID ORDER BY (TOT_MATERIAL_N + TOT_MATERIAL_OVERHEAD_N) DESC) 
END

我所做的其他更改是将 [Qty] > 0 AND 添加到 [Part Count] 行并将 Commondity ID 的基于字符串的条目替换为整数（更合适），因为 COMMODITY_ID 是对 COMMODITY_CODE 的引用，这就是字符串。

我希望它运行更快，而不是运行无限期。该过程现在需要永远运行。我现在在 38 分钟并且还在继续。我还尝试只复制过程本身中的代码并运行对其进行编译，这也需要很长时间，所以它是代码本身的一部分。

AllPartsList table 有 1.04m 行，bomBreakdown table 也是如此。 bomBreakdown table 要复杂得多，需要 40-60 秒才能生成。 bomSummary table 将有 4,100 行。 AllPartsList table 有适当的索引，bomBreakdown 没有。

ALTER PROCEDURE [dbo].[createBOMSummary]
AS

    DECLARE @processTime int=0, @begin datetime, @end datetime

    SET @begin = SYSDATETIME()

    IF OBJECT_ID(N'dbo.bomSummary', N'U') IS NOT NULL
        DROP TABLE bomSummary

    SELECT 
        DISTINCT ap.[SourcePartID] AS [Assembly Part ID],
        p.[PART_X] AS [Assembly Part #],
        p.[DESCR_X] AS [Assembly Part Description],

        (SELECT COUNT(DISTINCT [Component Part #]) FROM [bomBreakdown] WHERE [Qty] > 0 AND [Component Part ID] IS NOT NULL AND SourcePartID = ap.SourcePartID GROUP BY [SourcePartID]) AS [Part Count],
        (SELECT SUM([Qty]) FROM [bomBreakdown] WHERE [Component Part ID] IS NOT NULL AND SourcePartID = ap.[SourcePartID] GROUP BY [SourcePartID]) AS [Total # of Parts],
        ([dbo].[fn_getFactoryStdCost](ap.[SourcePartID])) AS [Factory Std Cost],

        COALESCE(
            (SELECT COUNT(DISTINCT ComponentPartID) 
              FROM AllPartsList apl
                LEFT JOIN visuser.EN_PART p1
                  ON p1.[EN_Part_ID] = apl.[ComponentPartID]
              WHERE 
                apl.ComponentPartID IS NOT NULL AND 
                apl.SourcePartID = ap.SourcePartID  AND
                p1.Commodity_ID IN (15, 84, 85, 87, 81, 92) -- Commodity Codes: 009, 072, 073, 075, 079, 082
              GROUP BY SourcePartID
            ), 0) AS [# of Docs], --0sec

        COALESCE(
        (SELECT COUNT(DISTINCT ComponentPartID) 
        FROM AllPartsList apl
            LEFT JOIN visuser.EN_PART p1
                ON p1.[EN_Part_ID] = apl.[ComponentPartID]
        WHERE 
            apl.ComponentPartID IS NOT NULL AND 
            apl.SourcePartID = ap.SourcePartID  AND
            p1.Commodity_ID IN (28)  -- Commodity Code 034
        GROUP BY SourcePartID
        ), 0) AS [# of Software], --0sec
    
        COALESCE(
        (SELECT COUNT(*) 
        FROM visuser.[PART_COST] 
        WHERE [STD_PO_Cost_N] > 0 AND 
            EN_PART_ID IN 
            (SELECT DISTINCT ComponentPartID FROM AllPartsList WHERE ComponentPartID IS NOT NULL AND SourcePartID = ap.SourcePartID)
        ), 0) AS [# of Std Cost Items], --0sec

        COALESCE(
        (SELECT COUNT(DISTINCT ComponentPartID) 
        FROM AllPartsList apl
            LEFT JOIN visuser.EN_PART p1
                ON p1.[EN_Part_ID] = apl.[ComponentPartID]
        WHERE 
            apl.ComponentPartID IS NOT NULL AND 
            apl.SourcePartID = ap.SourcePartID  AND
            p1.Commodity_ID IN (11)  -- Commodity Code: 002
        GROUP BY SourcePartID), 0
        ) AS [# of HR Devices] ,--0sec

        COALESCE(
        (SELECT COUNT(DISTINCT ComponentPartID) 
        FROM AllPartsList apl
            LEFT JOIN visuser.EN_PART p1
                ON p1.[EN_Part_ID] = apl.[ComponentPartID]
        WHERE 
            apl.ComponentPartID IS NOT NULL AND 
            apl.SourcePartID = ap.SourcePartID  AND
            p1.Commodity_ID IN (5)  -- Commodity Code: 007
        GROUP BY SourcePartID), 0
        ) AS [# of 3rd Party Devices], --0sec
        
        COALESCE(
        (SELECT COUNT(DISTINCT ComponentPartID) 
        FROM AllPartsList apl
            LEFT JOIN visuser.EN_PART p1
                ON p1.[EN_Part_ID] = apl.[ComponentPartID]
        WHERE 
            apl.ComponentPartID IS NOT NULL AND 
            apl.SourcePartID = ap.SourcePartID  AND
            p1.Commodity_ID IN (13) AND  -- Commodity Code: 005
            p1.MAKE_BUY_C = 'B'
        GROUP BY SourcePartID
        ), 0) AS [# of Robots], --0sec
        
        COALESCE(
        (SELECT COUNT(*) 
        FROM visuser.[PART_COST] c
            LEFT JOIN visuser.[EN_PART] p
            ON p.[EN_PART_ID] = c.[EN_PART_ID]
        WHERE 
            c.[STD_PO_Cost_N] > 0 AND 
            p.[MAKE_BUY_C] = 'B' AND
            c.[EN_PART_ID] IN 
               (SELECT DISTINCT ComponentPartID FROM AllPartsList WHERE ComponentPartID IS NOT NULL AND SourcePartID = ap.SourcePartID)
        ), 0) AS [# of Buy Parts], --0sec
        
        COALESCE(
        (SELECT COUNT(*) 
        FROM visuser.[PART_COST] c
            LEFT JOIN visuser.[EN_PART] p
            ON p.[EN_PART_ID] = c.[EN_PART_ID]
        WHERE 
            c.[STD_PO_Cost_N] > 0 AND 
            p.[MAKE_BUY_C] = 'M' AND
            c.[EN_PART_ID] IN 
            (SELECT DISTINCT ComponentPartID FROM AllPartsList WHERE ComponentPartID IS NOT NULL AND SourcePartID = ap.SourcePartID)
        ), 0) AS [# of Make Parts]  

    INTO bomSummary
    FROM AllPartsList ap
      LEFT JOIN visuser.EN_PART p
        ON p.[EN_Part_ID] = ap.[SourcePartID]
    ORDER BY [PART_X]

    SET @end = SYSDATETIME()
    SET @processTime = DATEDIFF(s, @begin, @end)

    PRINT @end
    PRINT CHAR(10)+CHAR(13)
    PRINT 'bomSummary Processing Time: ' + CONVERT(varchar, @processTime)

GO

这是 bomBreakdown table 的样子：

和 AllPartsList table：

如果我注释掉两条记录需要1m 20s处理的函数行，这里是执行计划的一部分。看起来我的每个 COALESCE 都会增加 4-6 秒的处理时间。

如果我删除所有 COALESCE，则处理所有 4981 条记录需要 2 分 50 秒。这是它的执行列表：

执行计划建议了几个额外的索引，所以我添加了这些，现在 1 条记录需要 0 秒，2 条需要 5 秒，10 条需要 1 秒，100 条需要 2 秒，1000 条需要 28 条，所有 4981 条需要 4 分 17 秒. 额外的索引肯定有帮助，我不再看到 %s 超过 1000%，有几个仍然超过 100%，这让我觉得可以做更多的优化，我只是不确定在哪里。执行计划很大，所以这里只是几个镜头：

不确定这 2 条记录是怎么回事。虽然不是以前的 90 秒，但至少现在已经结束了。

我看到奇怪的是它有（1000 行受影响），然后（1 行受影响）。我不知道那 1 行是什么或它来自哪里。而且我仍然想知道为什么进行这些少量更改会产生如此大的变化。

我正在使用：

SQL 服务器 2019 (v15.0.2070.41)
SSMS v18.5

以下是我根据allmhuran的建议修改后的结果：

SELECT
    DISTINCT ap.[SourcePartID] AS [Assembly Part ID],
    p.[PART_X] AS [Assembly Part #],
    p.[DESCR_X] AS [Assembly Part Description],
    oa2.[Part Count],
    oa2.[Total # of Parts],
    ([dbo].[fn_getFactoryStdCost](ap.[SourcePartID])) AS [Factory Std Cost],
    oa2.[# of Docs],
    oa2.[# of Software],
    'Logic Pending' AS [# of Std Cost Items],
    oa2.[# of HR Devices],
    oa2.[# of 3rd Party Devices],
    oa2.[# of Robots],
    oa2.[# of Buy Parts],
    oa2.[# of Make Parts]
    
  FROM AllPartsList ap
    LEFT JOIN visuser.EN_PART p
      ON p.[EN_Part_ID] = ap.[SourcePartID]
  OUTER APPLY (
        SELECT
            [Part Count]                = COUNT(    DISTINCT IIF( [Qty] = 0, null, [Component Part #])  ),  
            [Total # of Parts]          = SUM([Qty]),
            [# of Docs]                 = COUNT(    DISTINCT IIF( [Commodity Code] IN ('009', '072', '073', '075', '079', '082'), [Component Part #], null) ), -- Commodity Codes: 009, 072, 073, 075, 079, 082  :  Commodity ID: 15, 84, 85, 87, 81, 92
            [# of Software]             = COUNT(    DISTINCT IIF( [Commodity Code] IN ('034'), [Component Part #], null)    ), -- Commodity Code 034  :  Commodity ID: 28
            [# of HR Devices]           = COUNT(    DISTINCT IIF( [Commodity Code] IN ('002'), [Component Part #], null)    ), -- Commodity Code 002  :  Commodity ID: 11
            [# of 3rd Party Devices]    = COUNT(    DISTINCT IIF( [Commodity Code] IN ('007'), [Component Part #], null)    ), -- Commodity Code 007  :  Commodity ID: 5
            [# of Robots]               = COUNT(    DISTINCT IIF( ( [Commodity Code] IN ('005') AND [Make/Buy] = 'B' ), [Component Part #], null)   ), -- Commodity Code 005  :  Commodity ID: 13
            [# of Buy Parts]            = COUNT(    DISTINCT IIF( [Make/Buy] = 'B', [Component Part #], null)   ),
            [# of Make Parts]           = COUNT(    DISTINCT IIF( [Make/Buy] = 'M', [Component Part #], null)   )

          FROM bomBreakdown
          WHERE
            [Component Part ID] IS NOT NULL AND 
            [SourcePartID] = ap.[SourcePartID] AND
            --[SourcePartID] = ap.[AssemblyPartID] AND
            ap.SourcePartID = 964
          GROUP BY [SourcePartID]
    ) oa2

Answer 1

好的，抽点时间完成这个。

标量函数重构

正如我在评论中提到的，标量函数对基于集合的操作做坏事。一般来说，如果你有这样的模式

create function scalar_UDF(@i int) returns int as begin
   return @i * 2;
end

select    c = scalar_UDF(t.c)
from      t;

然后这会将您的 select 变成暗中进行的逐行 (RBAR) 操作。

您可以通过坚持使用基于集合的操作来提高性能。一种方法是将标量 UDF 标记为 inline，这基本上告诉 SQL 它可以在生成查询计划之前将您的查询重写为：

select    c = t.c * 2
from      t;

但是标量函数内联是微软很难解决的事情，而且还是有点bug。另一种方法是自己处理，使用内联 table 值函数和 cross apply 或 outer apply

create function inline_TVF(@i int) returns table as return 
(
   select result = @i * 2
)

select       c = u.result
from         t
outer apply  inline_TVF(t.c) u;

实际分解重构

您现有的部分程序如下所示：

select      [Part Count] =
            (
               select   count(distinct [Component Part #])
               from     bomBreakdown
               where    Qty > 0
                        and [Component Part ID] is not null
                        and SourcePartID = ap.SourcePartID
               group by SourcePartID
            ),
            [Total # of Parts] =
            (
               select   sum(Qty)
               from     bomBreakdown
               where    [Component Part ID] is not null
                        and SourcePartID = ap.SourcePartID
               group by SourcePartID
            )
            -- , more ...

这两个子查询看起来非常相似。就是这种模式：

select      a = (
               select x1 from y where z
            ),
            b = (
               select x2 from y where almost_z
            )

我们真正想做的是像下面这样的事情。如果可以，那么查询只需要命中 y table 一次，而不是命中两次。但是语法当然是无效的：

select      a = t.x1, 
            b = t.x2
from        (
                select  x1 where z, 
                        x2 where almost_z
                from y
            ) t

啊哈，但也许我们可以聪明一点。如果我们回顾您的具体案例，我们可能会将其更改为如下内容：

select      oa1.[Part Count],
            oa1.[Total # of Parts]
into        bomSummary
from        AllPartsList    ap
left join   visuser.EN_PART p   on p.EN_Part_ID = ap.SourcePartID
outer apply (
                select    [Part Count]       = count
                                               (
                                                  distinct iif
                                                  (
                                                     Qty = 0, null, [Component Part #]
                                                  )
                                               ),
                          [Total # of Parts] = sum(qty)
                from      bomBreakdown
                where     [Component Part ID] is not null
                          and SourcePartID = ap.SourcePartID
                group by  SourcePartID
            ) 
            oa1

此处，如果数量为零，iif(Qty = 0, null, [Component Part #]) 将使该列为空。计数将忽略那些空值。我们得到了独特的，就像以前一样。所以我们在这里偷偷设法得到了一个 where 子句：“计算数量不等于零的不同 component part # 值”。现在我们也可以对 Qty 列求和，我们完成了重构。

可以在此存储过程的许多处进行相同类型的重构。这实际上是一个很好的重构学习练习 SQL。我不打算做所有这些，只是尝试识别模式，并遵循因式分解过程——就像你在代数中所做的一样。因为，在很多方面，这个是代数！

如有 typos/syntax 错误，敬请谅解。我无法通过实际查询来检查这一点 window，我在这里的目的是展示一些想法，而不是实际重写原始查询。

SQL 服务器存储过程在进行微小更改后运行花费了很长时间

SQL Server stored procedure taking a LONG time to run after minor changes

sql

sql-server

stored-procedures

SQL 服务器存储过程在进行微小更改后 运行 花费了很长时间

SQL Server stored procedure taking a LONG time to run after minor changes

sql

sql-server

stored-procedures

SQL 服务器存储过程在进行微小更改后运行花费了很长时间