如何选择最小的连续行?

How to pick the smallest of the consecutive rows?

考虑按 Date desc 排序的 table 中的数据。

如果有多个连续的行具有相同的描述,我想只取最早日期的行。例如,第 2 行和第 3 行是 Unknown,我只想保留 9/12/2014.

上的那个

我一直在尝试将 CTE 与 ROW_NUMBER() 结合使用,但我无法将其限制为具有连续相同描述的行。

;WITH removeConsecutiveRows AS (
  SELECT ph.Description,
       ph.Price,
       ph.Date,
       ROW_NUMBER() OVER (
          PARTITION BY ph.Description
          ORDER BY ph.Date
       ) AS rowNum 
  FROM #PriceHistory ph (NOLOCK)
)
SELECT s.Description,
       s.Price,
       s.Date,
       s.rowNum
FROM removeConsecutiveRows s
WHERE s.rowNum = 1
ORDER BY s.Date DESC

所以最后,它应该是这样的:

请注意,这是 SQL Server 2008。

在检测到 groups/islands 之后,这看起来像是一个 "gaps-and-island" 问题,上面有一个 "top-1-per-group"。

这是一种方法。

示例数据

CREATE TABLE #temptable ( Descr varchar(50), [Price] int, dt date )
INSERT INTO #temptable
VALUES
( 'Active', 799900, N'2019-02-27T00:00:00' ), 
( 'Unknown', 629900, N'2014-09-24T00:00:00' ), 
( 'Unknown', 629900, N'2014-09-12T00:00:00' ), 
( 'Sold', 625900, N'2014-09-08T00:00:00' ), 
( 'Unknown', 629900, N'2014-08-10T00:00:00' ), 
( 'Active', 629900, N'2014-07-27T00:00:00' ), 
( 'Pending', 629900, N'2014-07-25T00:00:00' ), 
( 'Pending', 629900, N'2014-07-24T00:00:00' ), 
( 'Unknown', 629900, N'2014-07-20T00:00:00' ), 
( 'Active', 629900, N'2014-07-16T00:00:00' ), 
( 'Active', 629900, N'2014-07-15T00:00:00' ), 
( 'Taking Backup Offers', 629900, N'2014-07-11T00:00:00' ), 
( 'Active', 629900, N'2014-06-28T00:00:00' ), 
( 'Active', 629900, N'2014-06-27T00:00:00' ), 
( 'Taking Backup Offers', 629900, N'2014-06-27T00:00:00' ), 
( 'Active', 629900, N'2014-06-23T00:00:00' ), 
( 'Active', 629900, N'2014-06-11T00:00:00' ), 
( 'Active', 629900, N'2014-06-10T00:00:00' ), 
( 'Sold', 570000, N'2010-01-22T00:00:00' ), 
( 'Sold', 288000, N'2000-09-01T00:00:00' );

查询

WITH
CTE_RN
AS
(
    SELECT
        * 
        ,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
    FROM #temptable
)
,CTE_Groups
AS
(
    SELECT
        *
        ,rn1 - rn2 AS Groups
        ,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
    FROM CTE_RN
)
SELECT Descr, Price, dt
FROM CTE_Groups
WHERE rn = 1
ORDER BY dt DESC;

结果

+----------------------+--------+------------+
|        Descr         | Price  |     dt     |
+----------------------+--------+------------+
| Active               | 799900 | 2019-02-27 |
| Unknown              | 629900 | 2014-09-12 |
| Sold                 | 625900 | 2014-09-08 |
| Unknown              | 629900 | 2014-08-10 |
| Active               | 629900 | 2014-07-27 |
| Pending              | 629900 | 2014-07-24 |
| Unknown              | 629900 | 2014-07-20 |
| Active               | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active               | 629900 | 2014-06-27 |
| Active               | 629900 | 2014-06-10 |
| Sold                 | 288000 | 2000-09-01 |
+----------------------+--------+------------+

请注意,由于有两行具有相同的日期 2014-06-27,服务器可能 return 它们就像您在预期结果中显示的那样,或者它可以 return 它们作为显示在这里。很可能你有一个 ID 列,所以用它来解决排序问题。


了解它是如何工作的 运行 中间查询并检查其结果(列 rn1, rn2, Groups, rn)。

WITH
CTE_RN
AS
(
    SELECT
        * 
        ,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
    FROM #temptable
)
,CTE_Groups
AS
(
    SELECT
        *
        ,rn1 - rn2 AS Groups
        ,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
    FROM CTE_RN
)
SELECT *
FROM CTE_Groups
ORDER BY dt DESC;

结果

+----------------------+--------+------------+-----+-----+--------+----+
|        Descr         | Price  |     dt     | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active               | 799900 | 2019-02-27 |   1 |   1 |      0 |  1 |
| Unknown              | 629900 | 2014-09-24 |   2 |   1 |      1 |  2 |
| Unknown              | 629900 | 2014-09-12 |   3 |   2 |      1 |  1 |
| Sold                 | 625900 | 2014-09-08 |   4 |   1 |      3 |  1 |
| Unknown              | 629900 | 2014-08-10 |   5 |   3 |      2 |  1 |
| Active               | 629900 | 2014-07-27 |   6 |   2 |      4 |  1 |
| Pending              | 629900 | 2014-07-25 |   7 |   1 |      6 |  2 |
| Pending              | 629900 | 2014-07-24 |   8 |   2 |      6 |  1 |
| Unknown              | 629900 | 2014-07-20 |   9 |   4 |      5 |  1 |
| Active               | 629900 | 2014-07-16 |  10 |   3 |      7 |  2 |
| Active               | 629900 | 2014-07-15 |  11 |   4 |      7 |  1 |
| Taking Backup Offers | 629900 | 2014-07-11 |  12 |   1 |     11 |  1 |
| Active               | 629900 | 2014-06-28 |  13 |   5 |      8 |  2 |
| Active               | 629900 | 2014-06-27 |  14 |   6 |      8 |  1 |
| Taking Backup Offers | 629900 | 2014-06-27 |  15 |   2 |     13 |  1 |
| Active               | 629900 | 2014-06-23 |  16 |   7 |      9 |  3 |
| Active               | 629900 | 2014-06-11 |  17 |   8 |      9 |  2 |
| Active               | 629900 | 2014-06-10 |  18 |   9 |      9 |  1 |
| Sold                 | 570000 | 2010-01-22 |  19 |   2 |     17 |  2 |
| Sold                 | 288000 | 2000-09-01 |  20 |   3 |     17 |  1 |
+----------------------+--------+------------+-----+-----+--------+----+

提醒一句

ORDER BY dt DESC, rn1 ASC 添加到主查询并不能保证它会产生您期望的结果。值 14 和 15 的 rn1 可能会互换,因为它们的日期 (2014-06-27) 相同。如果日期不唯一,您需要一个额外的唯一列来使排序稳定且可预测。你的样本数据中没有这样的列,但通常表有唯一的主键,所以你应该使用它。

因此,对于您的示例数据,查询产生此结果是完全正常的:

中级

+----------------------+--------+------------+-----+-----+--------+----+
|        Descr         | Price  |     dt     | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active               | 799900 | 2019-02-27 |   1 |   1 |      0 |  1 |
| Unknown              | 629900 | 2014-09-24 |   2 |   1 |      1 |  2 |
| Unknown              | 629900 | 2014-09-12 |   3 |   2 |      1 |  1 |
| Sold                 | 625900 | 2014-09-08 |   4 |   1 |      3 |  1 |
| Unknown              | 629900 | 2014-08-10 |   5 |   3 |      2 |  1 |
| Active               | 629900 | 2014-07-27 |   6 |   2 |      4 |  1 |
| Pending              | 629900 | 2014-07-25 |   7 |   1 |      6 |  2 |
| Pending              | 629900 | 2014-07-24 |   8 |   2 |      6 |  1 |
| Unknown              | 629900 | 2014-07-20 |   9 |   4 |      5 |  1 |
| Active               | 629900 | 2014-07-16 |  10 |   3 |      7 |  2 |
| Active               | 629900 | 2014-07-15 |  11 |   4 |      7 |  1 |
| Taking Backup Offers | 629900 | 2014-07-11 |  12 |   1 |     11 |  1 |
| Active               | 629900 | 2014-06-28 |  13 |   5 |      8 |  1 |
| Taking Backup Offers | 629900 | 2014-06-27 |  14 |   2 |     12 |  1 |
| Active               | 629900 | 2014-06-27 |  15 |   6 |      9 |  4 |
| Active               | 629900 | 2014-06-23 |  16 |   7 |      9 |  3 |
| Active               | 629900 | 2014-06-11 |  17 |   8 |      9 |  2 |
| Active               | 629900 | 2014-06-10 |  18 |   9 |      9 |  1 |
| Sold                 | 570000 | 2010-01-22 |  19 |   2 |     17 |  2 |
| Sold                 | 288000 | 2000-09-01 |  20 |   3 |     17 |  1 |
+----------------------+--------+------------+-----+-----+--------+----+

决赛

+----------------------+--------+------------+
|        Descr         | Price  |     dt     |
+----------------------+--------+------------+
| Active               | 799900 | 2019-02-27 |
| Unknown              | 629900 | 2014-09-12 |
| Sold                 | 625900 | 2014-09-08 |
| Unknown              | 629900 | 2014-08-10 |
| Active               | 629900 | 2014-07-27 |
| Pending              | 629900 | 2014-07-24 |
| Unknown              | 629900 | 2014-07-20 |
| Active               | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Active               | 629900 | 2014-06-28 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active               | 629900 | 2014-06-10 |
| Sold                 | 288000 | 2000-09-01 |
+----------------------+--------+------------+

如您所见,这个结果与第一个结果不同,因为有两行具有相同的日期,引擎可以自由地将它们按任意顺序排列。

在这个结果中有 Active 和不同的日期 2014-06-28,因为 Active2014-06-27 恰好放在 Taking Backup Offers 2014-06-27 下面。