获取行的连续差异,包括第一行和最后一行,按一个或多个列分组
Get successive differences of rows, including both the first and last row, grouped by one or more columns
我正在尝试获取 SQL 中数据行的连续差异,包括第一行和最后一行之间的差异以及 0
,其中行按多列分组。
我有两个 table 看起来像这样
Date Value
+------------+-------+ +------------+-------+------+------+
| Date | Name | | Date | Value | Name | Type |
+------------+-------+ +------------+-------+------+------+
| 2019-10-10 | A | | 2019-10-11 | 10 | A | X |
| 2019-10-11 | A | | 2019-10-12 | 11 | A | X |
| 2019-10-12 | A | | 2019-10-14 | 20 | A | X |
| 2019-10-13 | A | | 2019-10-11 | 10 | A | Y |
| 2019-10-14 | A | | 2019-10-12 | 22 | A | Y |
| 2019-10-15 | A | | 2019-10-14 | 30 | A | Y |
| 2019-10-10 | B | | 2019-10-11 | 10 | B | X |
| 2019-10-11 | B | | 2019-10-12 | 33 | B | X |
| 2019-10-12 | B | | 2019-10-14 | 40 | B | X |
| 2019-10-13 | B | | 2019-10-11 | 10 | B | Y |
| 2019-10-14 | B | | 2019-10-12 | 44 | B | Y |
| 2019-10-15 | B | | 2019-10-15 | 50 | B | Y |
+------------+-------+ +------------+-------+------+------+
Date
table 包含不同名称的日期范围。 Value
table 每个名称都有不同类型的值。我想为每个值获取一组连续差异,按 Name
和 Type
.
分组
我要找的最终结果是
+------------+-------+------+-------+---------------+------------+
| Date | Name | Type | Value | PreviousValue | Difference |
+------------+-------+------+-------+---------------+------------+
| 2019-10-11 | A | X | 10 | 0 | 10 |
| 2019-10-12 | A | X | 11 | 10 | 1 |
| 2019-10-14 | A | X | 20 | 11 | 9 |
| 2019-10-15 | A | X | 0 | 20 | -20 |
| 2019-10-11 | A | Y | 10 | 0 | 10 |
| 2019-10-12 | A | Y | 22 | 10 | 12 |
| 2019-10-14 | A | Y | 30 | 22 | 8 |
| 2019-10-15 | A | Y | 0 | 30 | -30 |
| 2019-10-11 | B | X | 10 | 0 | 10 |
| 2019-10-12 | B | X | 33 | 10 | 23 |
| 2019-10-14 | B | X | 40 | 33 | 7 |
| 2019-10-15 | B | X | 0 | 40 | -40 |
| 2019-10-11 | B | Y | 10 | 0 | 10 |
| 2019-10-12 | B | Y | 44 | 10 | 34 |
| 2019-10-15 | B | Y | 50 | 44 | 10 |
+------------+-------+------+-------+---------------+------------+
请注意,B
–Y
组行说明了一个重要点——我们可能有最后一个日期的值,在这种情况下,不需要 "extra"该组的行。
我现在能得到的最接近的是
SELECT
d.[Date],
d.[Name],
v.[Type],
v.[Value],
[PreviousValue] = COALESCE(LAG(v.[Value]) OVER (PARTITION BY d.[Name], v.[Type] ORDER BY d.[Date]), 0),
[Difference] = v.[Value] - COALESCE(LAG(v.[Value]) OVER (PARTITION BY d.[Name], v.[Type] ORDER BY v.[Date]), 0)
FROM
[Dates] d
LEFT JOIN
[Values] v
ON
d.[Date] = v.[Date]
AND d.[Name] = v.[Name]
但这不会产生最后一行的差异。
只需使用 lag()
和默认值参数:
[PreviousValue] = COALESCE(LAG(v.Value, 1, 0) OVER (PARTITION BY d.[Name], v.[Type] ORDER BY d.[Date]), 0)
[Difference] = v.[Value] - COALESCE(LAG(v.Value, 1, 0) OVER (PARTITION BY d.[Name], v.[Type] ORDER BY v.[Date]), 0)
由于两边都缺少一些数据,你必须以某种方式弥补它。
一个技巧是通过仔细连接来创建此类缺失数据。
下面的示例首先将类型连接到 Dates
数据。这样 FULL JOIN
和 Values
数据也可以在类型上完成。
然后在添加足够的 COALESCE 或 ISNULL 之后,计算指标就变得容易了。
CREATE TABLE [Dates](
[Date] DATE NOT NULL,
[Name] VARCHAR(8) NOT NULL,
PRIMARY KEY ([Date], [Name])
);
INSERT INTO [Dates]
([Date], [Name]) VALUES
('2019-10-10','A')
, ('2019-10-11','A')
, ('2019-10-12','A')
, ('2019-10-13','A')
, ('2019-10-14','A')
, ('2019-10-15','A')
, ('2019-10-10','B')
, ('2019-10-11','B')
, ('2019-10-12','B')
, ('2019-10-13','B')
, ('2019-10-15','B')
;
CREATE TABLE [Values](
[Id] INT IDENTITY(1,1) PRIMARY KEY,
[Date] DATE NOT NULL,
[Name] VARCHAR(8) NOT NULL,
[Value] INTEGER NOT NULL,
[Type] VARCHAR(8) NOT NULL
);
INSERT INTO [Values]
([Date], [Value], [Name], [Type]) VALUES
('2019-10-11', 10, 'A', 'X')
, ('2019-10-12', 11, 'A', 'X')
, ('2019-10-14', 20, 'A', 'X')
, ('2019-10-11', 10, 'A', 'Y')
, ('2019-10-12', 22, 'A', 'Y')
, ('2019-10-14', 30, 'A', 'Y')
, ('2019-10-11', 10, 'B', 'X')
, ('2019-10-12', 33, 'B', 'X')
, ('2019-10-14', 40, 'B', 'X')
, ('2019-10-11', 10, 'B', 'Y')
, ('2019-10-12', 44, 'B', 'Y')
, ('2019-10-15', 50, 'B', 'Y')
;
WITH CTE_DATA AS
(
SELECT
[Name] = COALESCE(d.[Name],v.[Name])
, [Type] = COALESCE(tp.[Type],v.[Type])
, [Date] = COALESCE(d.[Date],v.[Date])
, [Value] = ISNULL(v.[Value], 0)
FROM [Dates] AS d
INNER JOIN
(
SELECT [Name], [Type], MAX([Date]) AS [Date]
FROM [Values]
GROUP BY [Name], [Type]
) AS tp
ON tp.[Name] = d.[Name]
FULL JOIN [Values] AS v
ON v.[Date] = d.[Date]
AND v.[Name] = d.[Name]
AND v.[Type] = tp.[Type]
WHERE v.[Type] IS NOT NULL
OR d.[Date] > tp.[Date]
)
SELECT
[Name], [Type], [Date], [Value]
, [PreviousValue] = ISNULL(LAG([Value]) OVER (PARTITION BY [Name], [Type] ORDER BY [Date]), 0)
, [Difference] = [Value] - ISNULL(LAG([Value]) OVER (PARTITION BY [Name], [Type] ORDER BY [Date]), 0)
FROM CTE_DATA
ORDER BY [Name], [Type], [Date]
Name | Type | Date | Value | PreviousValue | Difference
:--- | :--- | :------------------ | ----: | ------------: | ---------:
A | X | 11/10/2019 00:00:00 | 10 | 0 | 10
A | X | 12/10/2019 00:00:00 | 11 | 10 | 1
A | X | 14/10/2019 00:00:00 | 20 | 11 | 9
A | X | 15/10/2019 00:00:00 | 0 | 20 | -20
A | Y | 11/10/2019 00:00:00 | 10 | 0 | 10
A | Y | 12/10/2019 00:00:00 | 22 | 10 | 12
A | Y | 14/10/2019 00:00:00 | 30 | 22 | 8
A | Y | 15/10/2019 00:00:00 | 0 | 30 | -30
B | X | 11/10/2019 00:00:00 | 10 | 0 | 10
B | X | 12/10/2019 00:00:00 | 33 | 10 | 23
B | X | 14/10/2019 00:00:00 | 40 | 33 | 7
B | X | 15/10/2019 00:00:00 | 0 | 40 | -40
B | Y | 11/10/2019 00:00:00 | 10 | 0 | 10
B | Y | 12/10/2019 00:00:00 | 44 | 10 | 34
B | Y | 15/10/2019 00:00:00 | 50 | 44 | 6
在 db<>fiddle here
上测试
我正在尝试获取 SQL 中数据行的连续差异,包括第一行和最后一行之间的差异以及 0
,其中行按多列分组。
我有两个 table 看起来像这样
Date Value
+------------+-------+ +------------+-------+------+------+
| Date | Name | | Date | Value | Name | Type |
+------------+-------+ +------------+-------+------+------+
| 2019-10-10 | A | | 2019-10-11 | 10 | A | X |
| 2019-10-11 | A | | 2019-10-12 | 11 | A | X |
| 2019-10-12 | A | | 2019-10-14 | 20 | A | X |
| 2019-10-13 | A | | 2019-10-11 | 10 | A | Y |
| 2019-10-14 | A | | 2019-10-12 | 22 | A | Y |
| 2019-10-15 | A | | 2019-10-14 | 30 | A | Y |
| 2019-10-10 | B | | 2019-10-11 | 10 | B | X |
| 2019-10-11 | B | | 2019-10-12 | 33 | B | X |
| 2019-10-12 | B | | 2019-10-14 | 40 | B | X |
| 2019-10-13 | B | | 2019-10-11 | 10 | B | Y |
| 2019-10-14 | B | | 2019-10-12 | 44 | B | Y |
| 2019-10-15 | B | | 2019-10-15 | 50 | B | Y |
+------------+-------+ +------------+-------+------+------+
Date
table 包含不同名称的日期范围。 Value
table 每个名称都有不同类型的值。我想为每个值获取一组连续差异,按 Name
和 Type
.
我要找的最终结果是
+------------+-------+------+-------+---------------+------------+
| Date | Name | Type | Value | PreviousValue | Difference |
+------------+-------+------+-------+---------------+------------+
| 2019-10-11 | A | X | 10 | 0 | 10 |
| 2019-10-12 | A | X | 11 | 10 | 1 |
| 2019-10-14 | A | X | 20 | 11 | 9 |
| 2019-10-15 | A | X | 0 | 20 | -20 |
| 2019-10-11 | A | Y | 10 | 0 | 10 |
| 2019-10-12 | A | Y | 22 | 10 | 12 |
| 2019-10-14 | A | Y | 30 | 22 | 8 |
| 2019-10-15 | A | Y | 0 | 30 | -30 |
| 2019-10-11 | B | X | 10 | 0 | 10 |
| 2019-10-12 | B | X | 33 | 10 | 23 |
| 2019-10-14 | B | X | 40 | 33 | 7 |
| 2019-10-15 | B | X | 0 | 40 | -40 |
| 2019-10-11 | B | Y | 10 | 0 | 10 |
| 2019-10-12 | B | Y | 44 | 10 | 34 |
| 2019-10-15 | B | Y | 50 | 44 | 10 |
+------------+-------+------+-------+---------------+------------+
请注意,B
–Y
组行说明了一个重要点——我们可能有最后一个日期的值,在这种情况下,不需要 "extra"该组的行。
我现在能得到的最接近的是
SELECT
d.[Date],
d.[Name],
v.[Type],
v.[Value],
[PreviousValue] = COALESCE(LAG(v.[Value]) OVER (PARTITION BY d.[Name], v.[Type] ORDER BY d.[Date]), 0),
[Difference] = v.[Value] - COALESCE(LAG(v.[Value]) OVER (PARTITION BY d.[Name], v.[Type] ORDER BY v.[Date]), 0)
FROM
[Dates] d
LEFT JOIN
[Values] v
ON
d.[Date] = v.[Date]
AND d.[Name] = v.[Name]
但这不会产生最后一行的差异。
只需使用 lag()
和默认值参数:
[PreviousValue] = COALESCE(LAG(v.Value, 1, 0) OVER (PARTITION BY d.[Name], v.[Type] ORDER BY d.[Date]), 0)
[Difference] = v.[Value] - COALESCE(LAG(v.Value, 1, 0) OVER (PARTITION BY d.[Name], v.[Type] ORDER BY v.[Date]), 0)
由于两边都缺少一些数据,你必须以某种方式弥补它。
一个技巧是通过仔细连接来创建此类缺失数据。
下面的示例首先将类型连接到 Dates
数据。这样 FULL JOIN
和 Values
数据也可以在类型上完成。
然后在添加足够的 COALESCE 或 ISNULL 之后,计算指标就变得容易了。
CREATE TABLE [Dates]( [Date] DATE NOT NULL, [Name] VARCHAR(8) NOT NULL, PRIMARY KEY ([Date], [Name]) ); INSERT INTO [Dates] ([Date], [Name]) VALUES ('2019-10-10','A') , ('2019-10-11','A') , ('2019-10-12','A') , ('2019-10-13','A') , ('2019-10-14','A') , ('2019-10-15','A') , ('2019-10-10','B') , ('2019-10-11','B') , ('2019-10-12','B') , ('2019-10-13','B') , ('2019-10-15','B') ; CREATE TABLE [Values]( [Id] INT IDENTITY(1,1) PRIMARY KEY, [Date] DATE NOT NULL, [Name] VARCHAR(8) NOT NULL, [Value] INTEGER NOT NULL, [Type] VARCHAR(8) NOT NULL ); INSERT INTO [Values] ([Date], [Value], [Name], [Type]) VALUES ('2019-10-11', 10, 'A', 'X') , ('2019-10-12', 11, 'A', 'X') , ('2019-10-14', 20, 'A', 'X') , ('2019-10-11', 10, 'A', 'Y') , ('2019-10-12', 22, 'A', 'Y') , ('2019-10-14', 30, 'A', 'Y') , ('2019-10-11', 10, 'B', 'X') , ('2019-10-12', 33, 'B', 'X') , ('2019-10-14', 40, 'B', 'X') , ('2019-10-11', 10, 'B', 'Y') , ('2019-10-12', 44, 'B', 'Y') , ('2019-10-15', 50, 'B', 'Y') ;
WITH CTE_DATA AS ( SELECT [Name] = COALESCE(d.[Name],v.[Name]) , [Type] = COALESCE(tp.[Type],v.[Type]) , [Date] = COALESCE(d.[Date],v.[Date]) , [Value] = ISNULL(v.[Value], 0) FROM [Dates] AS d INNER JOIN ( SELECT [Name], [Type], MAX([Date]) AS [Date] FROM [Values] GROUP BY [Name], [Type] ) AS tp ON tp.[Name] = d.[Name] FULL JOIN [Values] AS v ON v.[Date] = d.[Date] AND v.[Name] = d.[Name] AND v.[Type] = tp.[Type] WHERE v.[Type] IS NOT NULL OR d.[Date] > tp.[Date] ) SELECT [Name], [Type], [Date], [Value] , [PreviousValue] = ISNULL(LAG([Value]) OVER (PARTITION BY [Name], [Type] ORDER BY [Date]), 0) , [Difference] = [Value] - ISNULL(LAG([Value]) OVER (PARTITION BY [Name], [Type] ORDER BY [Date]), 0) FROM CTE_DATA ORDER BY [Name], [Type], [Date]
Name | Type | Date | Value | PreviousValue | Difference :--- | :--- | :------------------ | ----: | ------------: | ---------: A | X | 11/10/2019 00:00:00 | 10 | 0 | 10 A | X | 12/10/2019 00:00:00 | 11 | 10 | 1 A | X | 14/10/2019 00:00:00 | 20 | 11 | 9 A | X | 15/10/2019 00:00:00 | 0 | 20 | -20 A | Y | 11/10/2019 00:00:00 | 10 | 0 | 10 A | Y | 12/10/2019 00:00:00 | 22 | 10 | 12 A | Y | 14/10/2019 00:00:00 | 30 | 22 | 8 A | Y | 15/10/2019 00:00:00 | 0 | 30 | -30 B | X | 11/10/2019 00:00:00 | 10 | 0 | 10 B | X | 12/10/2019 00:00:00 | 33 | 10 | 23 B | X | 14/10/2019 00:00:00 | 40 | 33 | 7 B | X | 15/10/2019 00:00:00 | 0 | 40 | -40 B | Y | 11/10/2019 00:00:00 | 10 | 0 | 10 B | Y | 12/10/2019 00:00:00 | 44 | 10 | 34 B | Y | 15/10/2019 00:00:00 | 50 | 44 | 6
在 db<>fiddle here
上测试