SQL 服务器 row_number() 超过分区依据，但忽略重复的分类值

Question

我正在尝试通过多个类别跟踪不同的路径。我的 table 的简化视图如下所示：

Table: customer_category

CustomerID   |  Category  |  Date
11111        |  A         |  2016-01-01
11111        |  B         |  2016-02-01
11111        |  C         |  2016-03-01
22222        |  A         |  2016-01-01
22222        |  A         |  2016-02-01
22222        |  A         |  2016-03-01
22222        |  C         |  2016-04-01
33333        |  A         |  2016-01-01
33333        |  B         |  2016-02-01
33333        |  C         |  2016-03-01
33333        |  C         |  2016-04-01

我可以使用这个查询找到绝对路径：

with cat_order as (
    select CustomerID
          ,Category
          ,row_number() over (partition by CustomerID order by Date) as rnk
    from customer_category
),pivot as (
    select CustomerID
      ,max(case when rnk = 1 then Category else null end) as category_1
      ,max(case when rnk = 2 then Category else null end) as category_2
      ,max(case when rnk = 3 then Category else null end) as category_3
      ,max(case when rnk = 4 then Category else null end) as category_4
    from cat_order
    group by CustomerID
)
select category_1, category_2, category_3, category_4, count(*) as count
from pivot
group by category_1, category_2, category_3, category_4

;

这给了我以下信息：

category_1  |  category_2  |  category_3  |  category_4  |  count
A           |  B           |  C           |              |  1
A           |  A           |  A           |  C           |  1
A           |  B           |  C           |  C           |  1

不过，我想要的是忽略重复的类别，这样我就会看到

category_1  |  category_2  |  category_3  |  category_4  |  count
A           |  B           |  C           |              |  2
A           |  C           |              |              |  1

在我看来，我认为我需要

省略 Category = lag(category)
排名高于分区...
使用 case 语句进行透视
汇总结果

感觉太复杂了。有更简单的方法吗？

Answer 1

据我所知，没有更简单的方法（给定您的数据和所需的输出）。为了获得您想要的结果，您基本上需要执行您概述的四个步骤（或它的一些变体）。不过，您可以 "simplify" 以不需要 CTE 的方式使用它。例如：

SELECT category_1 = P.[1],
       category_2 = P.[2],
       category_3 = P.[3],
       category_4 = P.[4],
       [Count] = COUNT(*)
FROM
(
    SELECT CustomerID,
           Category,
           rnk = SUM(checkprev) OVER (PARTITION BY CustomerID ORDER BY [Date])
    FROM 
    (
        SELECT *, checkprev = CASE WHEN LAG(Category) OVER (PARTITION BY CustomerID ORDER BY [Date]) = Category THEN 0 ELSE 1 END
        FROM customer_category
    ) T
) AS T
PIVOT
(
    MAX(Category) FOR rnk IN ([1], [2], [3], [4])
) AS P
GROUP BY P.[1], P.[2], P.[3], P.[4];

SQL 服务器 row_number() 超过分区依据，但忽略重复的分类值

SQL Server row_number() over partition by, but ignore repeating categorical values

sql-server

window-functions