如何使用window函数根据一列的值得到两列的累计数?
How to use window functions to get the cumulative count of two columns based on the value from one column?
我正在尝试获取两个播放器列的先前记录的累计计数,其中播放器 ID 存在于前面的列之一中。作为示例,我尝试输出下面的 table,其中最后两列是根据前三列计算得出的:
+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
| 1 | 1 | 2 | 0 | 0 |
| 2 | 2 | 1 | 1 | 1 |
| 3 | 1 | 3 | 2 | 0 |
| 4 | 2 | 1 | 2 | 3 |
+-----+-------+-------+------------------------+------------------------+
我正在尝试摆脱子查询,并且一直在探索 window 函数。但是,在这种情况下,在 p1_id
上进行分区的简单 COUNT
只会计算玩家在同一列中的记录:
SELECT
id_,
p1_id,
p2_id,
COUNT(p1_id) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
COUNT(p2_id) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
test.example_table
ORDER BY id_;
结果:
+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
| 1 | 1 | 2 | 0 | 0 |
| 2 | 2 | 1 | 0 | 0 |
| 3 | 1 | 3 | 1 | 0 |
| 4 | 2 | 1 | 1 | 1 |
+-----+-------+-------+------------------------+------------------------+
然后我认为我很聪明,通过结合 SUM
和 CASE
来解决这个问题:
SELECT
id_,
p1_id,
p2_id,
SUM(CASE WHEN (p1_id = p1_id OR p1_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
SUM(CASE WHEN (p2_id = p1_id OR p2_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
test.example_table
ORDER BY id_;
唉,没用:
+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
| 1 | 1 | 2 | NULL | NULL |
| 2 | 2 | 1 | NULL | NULL |
| 3 | 1 | 3 | 1 | NULL |
| 4 | 2 | 1 | 1 | 1 |
+-----+-------+-------+------------------------+------------------------+
有谁能把我从痛苦中解救出来吗?
这里是 SQL 创建和填充 table:
CREATE TABLE `example_table` (
`id_` int NOT NULL AUTO_INCREMENT,
`p1_id` int DEFAULT NULL,
`p2_id` int DEFAULT NULL,
PRIMARY KEY (`id_`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `example_table` VALUES (1,1,2),(2,2,1),(3,1,3),(4,2,1);
我向您提供了 2 个选项,我认为一个不会起作用,因为 mysql 并不总是支持 PIVOT
,但我使用更通用的东西编写了第二个选项。
他们使用技术上属于子查询的 CTE,但也许您会更喜欢它们,因为它们更容易破译。
选项 1:
WITH CTE_LONGFORM
AS (
SELECT id_ AS i
,p1_id AS pid
,'p1' AS col
FROM `example_table`
UNION ALL
SELECT id_ AS i
,p2_id AS pid
,'p2' AS col
FROM `example_table`
)
,CTE_CUMSUM
AS (
SELECT *
,COUNT(PID) OVER (
PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS COUNT_PID
FROM CTE_LONGFORM
)
SELECT I
,P1
,P2
FROM (
SELECT I
,COUNT_PID
,COL
FROM CTE_CUMSUM
)
PIVOT(SUM(COUNT_PID) FOR COL IN (
'p1'
,'p2'
)) AS p(I, P1, P2)
选项 2:
WITH CTE_LONGFORM
AS (
SELECT id_ AS i
,p1_id AS pid
,'p1' AS col
FROM `example_table`
UNION ALL
SELECT id_ AS i
,p2_id AS pid
,'p2' AS col
FROM `example_table`
)
,CTE_CUMSUM
AS (
SELECT *
,COUNT(PID) OVER (
PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS COUNT_PID
FROM CTE_LONGFORM
)
SELECT I AS id_
,MAX(CASE WHEN COL='p1' THEN pid ELSE NULL END) AS p1_id
,MAX(CASE WHEN COL='p2' THEN pid ELSE NULL END) AS p2_id
,SUM(CASE WHEN COL='p1' THEN COUNT_PID ELSE 0 END) AS p1_id_cumulative_count
,SUM(CASE WHEN COL='p2' THEN COUNT_PID ELSE 0 END) AS p2_id_cumulative_count
FROM CTE_CUMSUM
GROUP BY 1
ORDER BY 1
我用 Rasgo 翻译了第一个,它建议使用 PIVOT,因为我目前正在使用 Snowflake。如果我有一个 mysql 实例,它可能会建议第二个。
您不能直接使用 window 函数,因为分区依据无法创建所需的 window 帧。
一种解决方案是将所需值移到一列中(1 行 x 2 列变为 2 行 x 1 列)。然后使用 window 函数来计算值。最后使用条件聚合或其他技巧将 2 行 x 1 列转换回 1 行 x 2 列。像这样:
with cte1 as (
select t.id_, x.pid, x.col
from t, lateral (
select p1_id, 1 union all
select p2_id, 2
) as x(pid, col)
), cte2 as (
select *, count(*) over (
partition by pid
order by id_ rows between unbounded preceding and 1 preceding
) as rcount
from cte1
)
select id_
, min(case when col = 1 then pid end) as p1_id
, min(case when col = 2 then pid end) as p2_id
, min(case when col = 1 then rcount end) as p1_rcount
, min(case when col = 2 then rcount end) as p2_rcount
from cte2
group by id_
实际上,如果我对问题的理解是正确的,那么您不需要 window 函数或子查询。
您可以通过以下方式获得累计金额:
- 在具有
>
条件的 example_table
上应用 LEFT SELF JOIN
以便您将每一行与前面的行匹配
- 应用
CASE
来获取何时可以在两列中的任何一列中找到每个玩家
- 应用
SUM
聚合函数
这里是:
SELECT t1.id_,
SUM(CASE WHEN t1.p1_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p1_cumsum,
SUM(CASE WHEN t1.p2_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p2_cumsum
FROM example_table t1
LEFT JOIN example_table t2
ON t1.id_ > t2.id_
GROUP BY t1.id_
ORDER BY t1.id_;
找到对应的SQLFiddlehere.
我正在尝试获取两个播放器列的先前记录的累计计数,其中播放器 ID 存在于前面的列之一中。作为示例,我尝试输出下面的 table,其中最后两列是根据前三列计算得出的:
+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
| 1 | 1 | 2 | 0 | 0 |
| 2 | 2 | 1 | 1 | 1 |
| 3 | 1 | 3 | 2 | 0 |
| 4 | 2 | 1 | 2 | 3 |
+-----+-------+-------+------------------------+------------------------+
我正在尝试摆脱子查询,并且一直在探索 window 函数。但是,在这种情况下,在 p1_id
上进行分区的简单 COUNT
只会计算玩家在同一列中的记录:
SELECT
id_,
p1_id,
p2_id,
COUNT(p1_id) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
COUNT(p2_id) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
test.example_table
ORDER BY id_;
结果:
+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
| 1 | 1 | 2 | 0 | 0 |
| 2 | 2 | 1 | 0 | 0 |
| 3 | 1 | 3 | 1 | 0 |
| 4 | 2 | 1 | 1 | 1 |
+-----+-------+-------+------------------------+------------------------+
然后我认为我很聪明,通过结合 SUM
和 CASE
来解决这个问题:
SELECT
id_,
p1_id,
p2_id,
SUM(CASE WHEN (p1_id = p1_id OR p1_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p1_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p1_id_cumulative_count,
SUM(CASE WHEN (p2_id = p1_id OR p2_id = p2_id) THEN 1 ELSE 0 END) OVER (PARTITION BY p2_id ORDER BY id_ ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS p2_id_cumulative_count
FROM
test.example_table
ORDER BY id_;
唉,没用:
+-----+-------+-------+------------------------+------------------------+
| id_ | p1_id | p2_id | p1_id_cumulative_count | p2_id_cumulative_count |
+-----+-------+-------+------------------------+------------------------+
| 1 | 1 | 2 | NULL | NULL |
| 2 | 2 | 1 | NULL | NULL |
| 3 | 1 | 3 | 1 | NULL |
| 4 | 2 | 1 | 1 | 1 |
+-----+-------+-------+------------------------+------------------------+
有谁能把我从痛苦中解救出来吗?
这里是 SQL 创建和填充 table:
CREATE TABLE `example_table` (
`id_` int NOT NULL AUTO_INCREMENT,
`p1_id` int DEFAULT NULL,
`p2_id` int DEFAULT NULL,
PRIMARY KEY (`id_`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `example_table` VALUES (1,1,2),(2,2,1),(3,1,3),(4,2,1);
我向您提供了 2 个选项,我认为一个不会起作用,因为 mysql 并不总是支持 PIVOT
,但我使用更通用的东西编写了第二个选项。
他们使用技术上属于子查询的 CTE,但也许您会更喜欢它们,因为它们更容易破译。
选项 1:
WITH CTE_LONGFORM
AS (
SELECT id_ AS i
,p1_id AS pid
,'p1' AS col
FROM `example_table`
UNION ALL
SELECT id_ AS i
,p2_id AS pid
,'p2' AS col
FROM `example_table`
)
,CTE_CUMSUM
AS (
SELECT *
,COUNT(PID) OVER (
PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS COUNT_PID
FROM CTE_LONGFORM
)
SELECT I
,P1
,P2
FROM (
SELECT I
,COUNT_PID
,COL
FROM CTE_CUMSUM
)
PIVOT(SUM(COUNT_PID) FOR COL IN (
'p1'
,'p2'
)) AS p(I, P1, P2)
选项 2:
WITH CTE_LONGFORM
AS (
SELECT id_ AS i
,p1_id AS pid
,'p1' AS col
FROM `example_table`
UNION ALL
SELECT id_ AS i
,p2_id AS pid
,'p2' AS col
FROM `example_table`
)
,CTE_CUMSUM
AS (
SELECT *
,COUNT(PID) OVER (
PARTITION BY PID ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS COUNT_PID
FROM CTE_LONGFORM
)
SELECT I AS id_
,MAX(CASE WHEN COL='p1' THEN pid ELSE NULL END) AS p1_id
,MAX(CASE WHEN COL='p2' THEN pid ELSE NULL END) AS p2_id
,SUM(CASE WHEN COL='p1' THEN COUNT_PID ELSE 0 END) AS p1_id_cumulative_count
,SUM(CASE WHEN COL='p2' THEN COUNT_PID ELSE 0 END) AS p2_id_cumulative_count
FROM CTE_CUMSUM
GROUP BY 1
ORDER BY 1
我用 Rasgo 翻译了第一个,它建议使用 PIVOT,因为我目前正在使用 Snowflake。如果我有一个 mysql 实例,它可能会建议第二个。
您不能直接使用 window 函数,因为分区依据无法创建所需的 window 帧。
一种解决方案是将所需值移到一列中(1 行 x 2 列变为 2 行 x 1 列)。然后使用 window 函数来计算值。最后使用条件聚合或其他技巧将 2 行 x 1 列转换回 1 行 x 2 列。像这样:
with cte1 as (
select t.id_, x.pid, x.col
from t, lateral (
select p1_id, 1 union all
select p2_id, 2
) as x(pid, col)
), cte2 as (
select *, count(*) over (
partition by pid
order by id_ rows between unbounded preceding and 1 preceding
) as rcount
from cte1
)
select id_
, min(case when col = 1 then pid end) as p1_id
, min(case when col = 2 then pid end) as p2_id
, min(case when col = 1 then rcount end) as p1_rcount
, min(case when col = 2 then rcount end) as p2_rcount
from cte2
group by id_
实际上,如果我对问题的理解是正确的,那么您不需要 window 函数或子查询。
您可以通过以下方式获得累计金额:
- 在具有
>
条件的example_table
上应用LEFT SELF JOIN
以便您将每一行与前面的行匹配 - 应用
CASE
来获取何时可以在两列中的任何一列中找到每个玩家 - 应用
SUM
聚合函数
这里是:
SELECT t1.id_,
SUM(CASE WHEN t1.p1_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p1_cumsum,
SUM(CASE WHEN t1.p2_id IN (t2.p1_id, t2.p2_id) THEN 1 ELSE 0 END) AS p2_cumsum
FROM example_table t1
LEFT JOIN example_table t2
ON t1.id_ > t2.id_
GROUP BY t1.id_
ORDER BY t1.id_;
找到对应的SQLFiddlehere.