如何使用window函数根据一列的值得到两列的累计数?

How to use window functions to get the cumulative count of two columns based on the value from one column?

我正在尝试获取两个播放器列的先前记录的累计计数,其中播放器 ID 存在于前面的列之一中。作为示例,我尝试输出下面的 table,其中最后两列是根据前三列计算得出的:

+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
|   1 |     1 |     2 |                      0 |                      0 |
|   2 |     2 |     1 |                      1 |                      1 |
|   3 |     1 |     3 |                      2 |                      0 |
|   4 |     2 |     1 |                      2 |                      3 |
+-----+-------+-------+------------------------+------------------------+

我正在尝试摆脱子查询,并且一直在探索 window 函数。但是,在这种情况下,在 p1_id 上进行分区的简单 COUNT 只会计算玩家在同一列中的记录:

SELECT 
    id_,
    p1_id,
    p2_id,
    COUNT(p1_id) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
    COUNT(p2_id) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
    test.example_table
ORDER BY id_;

结果:

+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
|   1 |     1 |     2 |                      0 |                      0 |
|   2 |     2 |     1 |                      0 |                      0 |
|   3 |     1 |     3 |                      1 |                      0 |
|   4 |     2 |     1 |                      1 |                      1 |
+-----+-------+-------+------------------------+------------------------+

然后我认为我很聪明,通过结合 SUMCASE 来解决这个问题:

SELECT 
    id_,
    p1_id,
    p2_id,
    SUM(CASE WHEN (p1_id = p1_id OR p1_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
    SUM(CASE WHEN (p2_id = p1_id OR p2_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
    test.example_table
ORDER BY id_;

唉,没用:

+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
|   1 |     1 |     2 | NULL                   | NULL                   |
|   2 |     2 |     1 | NULL                   | NULL                   |
|   3 |     1 |     3 | 1                      | NULL                   |
|   4 |     2 |     1 | 1                      | 1                      |
+-----+-------+-------+------------------------+------------------------+

有谁能把我从痛苦中解救出来吗?

这里是 SQL 创建和填充 table:

CREATE TABLE `example_table` (
  `id_` int NOT NULL AUTO_INCREMENT,
  `p1_id` int DEFAULT NULL,
  `p2_id` int DEFAULT NULL,
  PRIMARY KEY (`id_`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `example_table` VALUES (1,1,2),(2,2,1),(3,1,3),(4,2,1);

我向您提供了 2 个选项,我认为一个不会起作用,因为 mysql 并不总是支持 PIVOT,但我使用更通用的东西编写了第二个选项。

他们使用技术上属于子查询的 CTE,但也许您会更喜欢它们,因为它们更容易破译。

选项 1:

WITH CTE_LONGFORM
AS (
    SELECT id_ AS i
        ,p1_id AS pid
        ,'p1' AS col
    FROM `example_table`
    
    UNION ALL
    
    SELECT id_ AS i
        ,p2_id AS pid
        ,'p2' AS col
    FROM `example_table`
    )
    ,CTE_CUMSUM
AS (
    SELECT *
        ,COUNT(PID) OVER (
            PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
            ) AS COUNT_PID
    FROM CTE_LONGFORM
    )
SELECT I
    ,P1
    ,P2
FROM (
    SELECT I
        ,COUNT_PID
        ,COL
    FROM CTE_CUMSUM
    )
PIVOT(SUM(COUNT_PID) FOR COL IN (
            'p1'
            ,'p2'
            )) AS p(I, P1, P2)

选项 2:

WITH CTE_LONGFORM
AS (
    SELECT id_ AS i
        ,p1_id AS pid
        ,'p1' AS col
    FROM `example_table`
    
    UNION ALL
    
    SELECT id_ AS i
        ,p2_id AS pid
        ,'p2' AS col
    FROM `example_table`
    )
    ,CTE_CUMSUM
AS (
    SELECT *
        ,COUNT(PID) OVER (
            PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
            ) AS COUNT_PID
    FROM CTE_LONGFORM
    )
SELECT I AS id_
    ,MAX(CASE WHEN COL='p1' THEN pid ELSE NULL END) AS p1_id
    ,MAX(CASE WHEN COL='p2' THEN pid ELSE NULL END) AS p2_id
    ,SUM(CASE WHEN COL='p1' THEN COUNT_PID ELSE 0 END) AS p1_id_cumulative_count
    ,SUM(CASE WHEN COL='p2' THEN COUNT_PID ELSE 0 END) AS p2_id_cumulative_count
FROM CTE_CUMSUM 
GROUP BY 1 
ORDER BY 1

我用 Rasgo 翻译了第一个,它建议使用 PIVOT,因为我目前正在使用 Snowflake。如果我有一个 mysql 实例,它可能会建议第二个。

您不能直接使用 window 函数,因为分区依据无法创建所需的 window 帧。

一种解决方案是将所需值移到一列中(1 行 x 2 列变为 2 行 x 1 列)。然后使用 window 函数来计算值。最后使用条件聚合或其他技巧将 2 行 x 1 列转换回 1 行 x 2 列。像这样:

with cte1 as (
    select t.id_, x.pid, x.col
    from t, lateral (
        select p1_id, 1 union all
        select p2_id, 2
    ) as x(pid, col)
), cte2 as (
    select *, count(*) over (
        partition by pid
        order by id_ rows between unbounded preceding and 1 preceding
    ) as rcount
    from cte1
)
select id_
     , min(case when col = 1 then pid end) as p1_id
     , min(case when col = 2 then pid end) as p2_id
     , min(case when col = 1 then rcount end) as p1_rcount
     , min(case when col = 2 then rcount end) as p2_rcount
from cte2
group by id_

Demo on db<>fiddle

实际上,如果我对问题的理解是正确的,那么您不需要 window 函数或子查询。

您可以通过以下方式获得累计金额:

  • 在具有 > 条件的 example_table 上应用 LEFT SELF JOIN 以便您将每一行与前面的行匹配
  • 应用 CASE 来获取何时可以在两列中的任何一列中找到每个玩家
  • 应用SUM聚合函数

这里是:

SELECT t1.id_,
       SUM(CASE WHEN t1.p1_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p1_cumsum,
       SUM(CASE WHEN t1.p2_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p2_cumsum
FROM      example_table t1
LEFT JOIN example_table t2
       ON t1.id_ > t2.id_
GROUP BY t1.id_
ORDER BY t1.id_;    

找到对应的SQLFiddlehere.