如何选择最小的连续行?
How to pick the smallest of the consecutive rows?
考虑按 Date desc 排序的 table 中的数据。
如果有多个连续的行具有相同的描述,我想只取最早日期的行。例如,第 2 行和第 3 行是 Unknown,我只想保留 9/12/2014.
上的那个
我一直在尝试将 CTE 与 ROW_NUMBER() 结合使用,但我无法将其限制为具有连续相同描述的行。
;WITH removeConsecutiveRows AS (
SELECT ph.Description,
ph.Price,
ph.Date,
ROW_NUMBER() OVER (
PARTITION BY ph.Description
ORDER BY ph.Date
) AS rowNum
FROM #PriceHistory ph (NOLOCK)
)
SELECT s.Description,
s.Price,
s.Date,
s.rowNum
FROM removeConsecutiveRows s
WHERE s.rowNum = 1
ORDER BY s.Date DESC
所以最后,它应该是这样的:
请注意,这是 SQL Server 2008。
在检测到 groups/islands 之后,这看起来像是一个 "gaps-and-island" 问题,上面有一个 "top-1-per-group"。
这是一种方法。
示例数据
CREATE TABLE #temptable ( Descr varchar(50), [Price] int, dt date )
INSERT INTO #temptable
VALUES
( 'Active', 799900, N'2019-02-27T00:00:00' ),
( 'Unknown', 629900, N'2014-09-24T00:00:00' ),
( 'Unknown', 629900, N'2014-09-12T00:00:00' ),
( 'Sold', 625900, N'2014-09-08T00:00:00' ),
( 'Unknown', 629900, N'2014-08-10T00:00:00' ),
( 'Active', 629900, N'2014-07-27T00:00:00' ),
( 'Pending', 629900, N'2014-07-25T00:00:00' ),
( 'Pending', 629900, N'2014-07-24T00:00:00' ),
( 'Unknown', 629900, N'2014-07-20T00:00:00' ),
( 'Active', 629900, N'2014-07-16T00:00:00' ),
( 'Active', 629900, N'2014-07-15T00:00:00' ),
( 'Taking Backup Offers', 629900, N'2014-07-11T00:00:00' ),
( 'Active', 629900, N'2014-06-28T00:00:00' ),
( 'Active', 629900, N'2014-06-27T00:00:00' ),
( 'Taking Backup Offers', 629900, N'2014-06-27T00:00:00' ),
( 'Active', 629900, N'2014-06-23T00:00:00' ),
( 'Active', 629900, N'2014-06-11T00:00:00' ),
( 'Active', 629900, N'2014-06-10T00:00:00' ),
( 'Sold', 570000, N'2010-01-22T00:00:00' ),
( 'Sold', 288000, N'2000-09-01T00:00:00' );
查询
WITH
CTE_RN
AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
FROM #temptable
)
,CTE_Groups
AS
(
SELECT
*
,rn1 - rn2 AS Groups
,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
FROM CTE_RN
)
SELECT Descr, Price, dt
FROM CTE_Groups
WHERE rn = 1
ORDER BY dt DESC;
结果
+----------------------+--------+------------+
| Descr | Price | dt |
+----------------------+--------+------------+
| Active | 799900 | 2019-02-27 |
| Unknown | 629900 | 2014-09-12 |
| Sold | 625900 | 2014-09-08 |
| Unknown | 629900 | 2014-08-10 |
| Active | 629900 | 2014-07-27 |
| Pending | 629900 | 2014-07-24 |
| Unknown | 629900 | 2014-07-20 |
| Active | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active | 629900 | 2014-06-27 |
| Active | 629900 | 2014-06-10 |
| Sold | 288000 | 2000-09-01 |
+----------------------+--------+------------+
请注意,由于有两行具有相同的日期 2014-06-27
,服务器可能 return 它们就像您在预期结果中显示的那样,或者它可以 return 它们作为显示在这里。很可能你有一个 ID
列,所以用它来解决排序问题。
了解它是如何工作的 运行 中间查询并检查其结果(列 rn1, rn2, Groups, rn
)。
WITH
CTE_RN
AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
FROM #temptable
)
,CTE_Groups
AS
(
SELECT
*
,rn1 - rn2 AS Groups
,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
FROM CTE_RN
)
SELECT *
FROM CTE_Groups
ORDER BY dt DESC;
结果
+----------------------+--------+------------+-----+-----+--------+----+
| Descr | Price | dt | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active | 799900 | 2019-02-27 | 1 | 1 | 0 | 1 |
| Unknown | 629900 | 2014-09-24 | 2 | 1 | 1 | 2 |
| Unknown | 629900 | 2014-09-12 | 3 | 2 | 1 | 1 |
| Sold | 625900 | 2014-09-08 | 4 | 1 | 3 | 1 |
| Unknown | 629900 | 2014-08-10 | 5 | 3 | 2 | 1 |
| Active | 629900 | 2014-07-27 | 6 | 2 | 4 | 1 |
| Pending | 629900 | 2014-07-25 | 7 | 1 | 6 | 2 |
| Pending | 629900 | 2014-07-24 | 8 | 2 | 6 | 1 |
| Unknown | 629900 | 2014-07-20 | 9 | 4 | 5 | 1 |
| Active | 629900 | 2014-07-16 | 10 | 3 | 7 | 2 |
| Active | 629900 | 2014-07-15 | 11 | 4 | 7 | 1 |
| Taking Backup Offers | 629900 | 2014-07-11 | 12 | 1 | 11 | 1 |
| Active | 629900 | 2014-06-28 | 13 | 5 | 8 | 2 |
| Active | 629900 | 2014-06-27 | 14 | 6 | 8 | 1 |
| Taking Backup Offers | 629900 | 2014-06-27 | 15 | 2 | 13 | 1 |
| Active | 629900 | 2014-06-23 | 16 | 7 | 9 | 3 |
| Active | 629900 | 2014-06-11 | 17 | 8 | 9 | 2 |
| Active | 629900 | 2014-06-10 | 18 | 9 | 9 | 1 |
| Sold | 570000 | 2010-01-22 | 19 | 2 | 17 | 2 |
| Sold | 288000 | 2000-09-01 | 20 | 3 | 17 | 1 |
+----------------------+--------+------------+-----+-----+--------+----+
提醒一句
将 ORDER BY dt DESC, rn1 ASC
添加到主查询并不能保证它会产生您期望的结果。值 14 和 15 的 rn1
可能会互换,因为它们的日期 (2014-06-27
) 相同。如果日期不唯一,您需要一个额外的唯一列来使排序稳定且可预测。你的样本数据中没有这样的列,但通常表有唯一的主键,所以你应该使用它。
因此,对于您的示例数据,查询产生此结果是完全正常的:
中级
+----------------------+--------+------------+-----+-----+--------+----+
| Descr | Price | dt | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active | 799900 | 2019-02-27 | 1 | 1 | 0 | 1 |
| Unknown | 629900 | 2014-09-24 | 2 | 1 | 1 | 2 |
| Unknown | 629900 | 2014-09-12 | 3 | 2 | 1 | 1 |
| Sold | 625900 | 2014-09-08 | 4 | 1 | 3 | 1 |
| Unknown | 629900 | 2014-08-10 | 5 | 3 | 2 | 1 |
| Active | 629900 | 2014-07-27 | 6 | 2 | 4 | 1 |
| Pending | 629900 | 2014-07-25 | 7 | 1 | 6 | 2 |
| Pending | 629900 | 2014-07-24 | 8 | 2 | 6 | 1 |
| Unknown | 629900 | 2014-07-20 | 9 | 4 | 5 | 1 |
| Active | 629900 | 2014-07-16 | 10 | 3 | 7 | 2 |
| Active | 629900 | 2014-07-15 | 11 | 4 | 7 | 1 |
| Taking Backup Offers | 629900 | 2014-07-11 | 12 | 1 | 11 | 1 |
| Active | 629900 | 2014-06-28 | 13 | 5 | 8 | 1 |
| Taking Backup Offers | 629900 | 2014-06-27 | 14 | 2 | 12 | 1 |
| Active | 629900 | 2014-06-27 | 15 | 6 | 9 | 4 |
| Active | 629900 | 2014-06-23 | 16 | 7 | 9 | 3 |
| Active | 629900 | 2014-06-11 | 17 | 8 | 9 | 2 |
| Active | 629900 | 2014-06-10 | 18 | 9 | 9 | 1 |
| Sold | 570000 | 2010-01-22 | 19 | 2 | 17 | 2 |
| Sold | 288000 | 2000-09-01 | 20 | 3 | 17 | 1 |
+----------------------+--------+------------+-----+-----+--------+----+
决赛
+----------------------+--------+------------+
| Descr | Price | dt |
+----------------------+--------+------------+
| Active | 799900 | 2019-02-27 |
| Unknown | 629900 | 2014-09-12 |
| Sold | 625900 | 2014-09-08 |
| Unknown | 629900 | 2014-08-10 |
| Active | 629900 | 2014-07-27 |
| Pending | 629900 | 2014-07-24 |
| Unknown | 629900 | 2014-07-20 |
| Active | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Active | 629900 | 2014-06-28 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active | 629900 | 2014-06-10 |
| Sold | 288000 | 2000-09-01 |
+----------------------+--------+------------+
如您所见,这个结果与第一个结果不同,因为有两行具有相同的日期,引擎可以自由地将它们按任意顺序排列。
在这个结果中有 Active
和不同的日期 2014-06-28
,因为 Active
和 2014-06-27
恰好放在 Taking Backup Offers 2014-06-27
下面。
考虑按 Date desc 排序的 table 中的数据。
如果有多个连续的行具有相同的描述,我想只取最早日期的行。例如,第 2 行和第 3 行是 Unknown,我只想保留 9/12/2014.
上的那个我一直在尝试将 CTE 与 ROW_NUMBER() 结合使用,但我无法将其限制为具有连续相同描述的行。
;WITH removeConsecutiveRows AS (
SELECT ph.Description,
ph.Price,
ph.Date,
ROW_NUMBER() OVER (
PARTITION BY ph.Description
ORDER BY ph.Date
) AS rowNum
FROM #PriceHistory ph (NOLOCK)
)
SELECT s.Description,
s.Price,
s.Date,
s.rowNum
FROM removeConsecutiveRows s
WHERE s.rowNum = 1
ORDER BY s.Date DESC
所以最后,它应该是这样的:
请注意,这是 SQL Server 2008。
在检测到 groups/islands 之后,这看起来像是一个 "gaps-and-island" 问题,上面有一个 "top-1-per-group"。
这是一种方法。
示例数据
CREATE TABLE #temptable ( Descr varchar(50), [Price] int, dt date )
INSERT INTO #temptable
VALUES
( 'Active', 799900, N'2019-02-27T00:00:00' ),
( 'Unknown', 629900, N'2014-09-24T00:00:00' ),
( 'Unknown', 629900, N'2014-09-12T00:00:00' ),
( 'Sold', 625900, N'2014-09-08T00:00:00' ),
( 'Unknown', 629900, N'2014-08-10T00:00:00' ),
( 'Active', 629900, N'2014-07-27T00:00:00' ),
( 'Pending', 629900, N'2014-07-25T00:00:00' ),
( 'Pending', 629900, N'2014-07-24T00:00:00' ),
( 'Unknown', 629900, N'2014-07-20T00:00:00' ),
( 'Active', 629900, N'2014-07-16T00:00:00' ),
( 'Active', 629900, N'2014-07-15T00:00:00' ),
( 'Taking Backup Offers', 629900, N'2014-07-11T00:00:00' ),
( 'Active', 629900, N'2014-06-28T00:00:00' ),
( 'Active', 629900, N'2014-06-27T00:00:00' ),
( 'Taking Backup Offers', 629900, N'2014-06-27T00:00:00' ),
( 'Active', 629900, N'2014-06-23T00:00:00' ),
( 'Active', 629900, N'2014-06-11T00:00:00' ),
( 'Active', 629900, N'2014-06-10T00:00:00' ),
( 'Sold', 570000, N'2010-01-22T00:00:00' ),
( 'Sold', 288000, N'2000-09-01T00:00:00' );
查询
WITH
CTE_RN
AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
FROM #temptable
)
,CTE_Groups
AS
(
SELECT
*
,rn1 - rn2 AS Groups
,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
FROM CTE_RN
)
SELECT Descr, Price, dt
FROM CTE_Groups
WHERE rn = 1
ORDER BY dt DESC;
结果
+----------------------+--------+------------+
| Descr | Price | dt |
+----------------------+--------+------------+
| Active | 799900 | 2019-02-27 |
| Unknown | 629900 | 2014-09-12 |
| Sold | 625900 | 2014-09-08 |
| Unknown | 629900 | 2014-08-10 |
| Active | 629900 | 2014-07-27 |
| Pending | 629900 | 2014-07-24 |
| Unknown | 629900 | 2014-07-20 |
| Active | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active | 629900 | 2014-06-27 |
| Active | 629900 | 2014-06-10 |
| Sold | 288000 | 2000-09-01 |
+----------------------+--------+------------+
请注意,由于有两行具有相同的日期 2014-06-27
,服务器可能 return 它们就像您在预期结果中显示的那样,或者它可以 return 它们作为显示在这里。很可能你有一个 ID
列,所以用它来解决排序问题。
了解它是如何工作的 运行 中间查询并检查其结果(列 rn1, rn2, Groups, rn
)。
WITH
CTE_RN
AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
FROM #temptable
)
,CTE_Groups
AS
(
SELECT
*
,rn1 - rn2 AS Groups
,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
FROM CTE_RN
)
SELECT *
FROM CTE_Groups
ORDER BY dt DESC;
结果
+----------------------+--------+------------+-----+-----+--------+----+
| Descr | Price | dt | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active | 799900 | 2019-02-27 | 1 | 1 | 0 | 1 |
| Unknown | 629900 | 2014-09-24 | 2 | 1 | 1 | 2 |
| Unknown | 629900 | 2014-09-12 | 3 | 2 | 1 | 1 |
| Sold | 625900 | 2014-09-08 | 4 | 1 | 3 | 1 |
| Unknown | 629900 | 2014-08-10 | 5 | 3 | 2 | 1 |
| Active | 629900 | 2014-07-27 | 6 | 2 | 4 | 1 |
| Pending | 629900 | 2014-07-25 | 7 | 1 | 6 | 2 |
| Pending | 629900 | 2014-07-24 | 8 | 2 | 6 | 1 |
| Unknown | 629900 | 2014-07-20 | 9 | 4 | 5 | 1 |
| Active | 629900 | 2014-07-16 | 10 | 3 | 7 | 2 |
| Active | 629900 | 2014-07-15 | 11 | 4 | 7 | 1 |
| Taking Backup Offers | 629900 | 2014-07-11 | 12 | 1 | 11 | 1 |
| Active | 629900 | 2014-06-28 | 13 | 5 | 8 | 2 |
| Active | 629900 | 2014-06-27 | 14 | 6 | 8 | 1 |
| Taking Backup Offers | 629900 | 2014-06-27 | 15 | 2 | 13 | 1 |
| Active | 629900 | 2014-06-23 | 16 | 7 | 9 | 3 |
| Active | 629900 | 2014-06-11 | 17 | 8 | 9 | 2 |
| Active | 629900 | 2014-06-10 | 18 | 9 | 9 | 1 |
| Sold | 570000 | 2010-01-22 | 19 | 2 | 17 | 2 |
| Sold | 288000 | 2000-09-01 | 20 | 3 | 17 | 1 |
+----------------------+--------+------------+-----+-----+--------+----+
提醒一句
将 ORDER BY dt DESC, rn1 ASC
添加到主查询并不能保证它会产生您期望的结果。值 14 和 15 的 rn1
可能会互换,因为它们的日期 (2014-06-27
) 相同。如果日期不唯一,您需要一个额外的唯一列来使排序稳定且可预测。你的样本数据中没有这样的列,但通常表有唯一的主键,所以你应该使用它。
因此,对于您的示例数据,查询产生此结果是完全正常的:
中级
+----------------------+--------+------------+-----+-----+--------+----+
| Descr | Price | dt | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active | 799900 | 2019-02-27 | 1 | 1 | 0 | 1 |
| Unknown | 629900 | 2014-09-24 | 2 | 1 | 1 | 2 |
| Unknown | 629900 | 2014-09-12 | 3 | 2 | 1 | 1 |
| Sold | 625900 | 2014-09-08 | 4 | 1 | 3 | 1 |
| Unknown | 629900 | 2014-08-10 | 5 | 3 | 2 | 1 |
| Active | 629900 | 2014-07-27 | 6 | 2 | 4 | 1 |
| Pending | 629900 | 2014-07-25 | 7 | 1 | 6 | 2 |
| Pending | 629900 | 2014-07-24 | 8 | 2 | 6 | 1 |
| Unknown | 629900 | 2014-07-20 | 9 | 4 | 5 | 1 |
| Active | 629900 | 2014-07-16 | 10 | 3 | 7 | 2 |
| Active | 629900 | 2014-07-15 | 11 | 4 | 7 | 1 |
| Taking Backup Offers | 629900 | 2014-07-11 | 12 | 1 | 11 | 1 |
| Active | 629900 | 2014-06-28 | 13 | 5 | 8 | 1 |
| Taking Backup Offers | 629900 | 2014-06-27 | 14 | 2 | 12 | 1 |
| Active | 629900 | 2014-06-27 | 15 | 6 | 9 | 4 |
| Active | 629900 | 2014-06-23 | 16 | 7 | 9 | 3 |
| Active | 629900 | 2014-06-11 | 17 | 8 | 9 | 2 |
| Active | 629900 | 2014-06-10 | 18 | 9 | 9 | 1 |
| Sold | 570000 | 2010-01-22 | 19 | 2 | 17 | 2 |
| Sold | 288000 | 2000-09-01 | 20 | 3 | 17 | 1 |
+----------------------+--------+------------+-----+-----+--------+----+
决赛
+----------------------+--------+------------+
| Descr | Price | dt |
+----------------------+--------+------------+
| Active | 799900 | 2019-02-27 |
| Unknown | 629900 | 2014-09-12 |
| Sold | 625900 | 2014-09-08 |
| Unknown | 629900 | 2014-08-10 |
| Active | 629900 | 2014-07-27 |
| Pending | 629900 | 2014-07-24 |
| Unknown | 629900 | 2014-07-20 |
| Active | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Active | 629900 | 2014-06-28 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active | 629900 | 2014-06-10 |
| Sold | 288000 | 2000-09-01 |
+----------------------+--------+------------+
如您所见,这个结果与第一个结果不同,因为有两行具有相同的日期,引擎可以自由地将它们按任意顺序排列。
在这个结果中有 Active
和不同的日期 2014-06-28
,因为 Active
和 2014-06-27
恰好放在 Taking Backup Offers 2014-06-27
下面。