运行 排除具有重复列值的行的总和
Running sum excluding rows with duplicate column value
一个例子table:
video
encoding
video time spent
encoding bytes
encoding bytes running sum
video time spent running sum (expected)
video time spent running sum (actual)
A
1
1
500
500
1
1
A
2
1
400
900
1
2
B
3
2
300
1200
3
5
B
4
2
200
1400
3
8
B
5
2
100
1500
3
11
B
6
2
100
1600
3
14
- video time spent 列有一个视频被观看了多少时间;观看了哪种编码并不重要。
- 花费的视频时间 运行 总和是我想要得到的。它应该只计算在视频级别花费的时间,忽略编码。
我想 select 尽可能多地编码字节,同时保持在花费的视频时间总和 < X 以内。
我目前的查询:
SELECT *
FORM (
SELECT
...,
SUM(encoding_bytes) OVER(ORDER BY encoding_bytes desc) AS encoding_bytes_running_sum,
SUM(video_time_spent) OVER (ORDER BY encoding_bytes desc) AS video_time_spent_running_sum
...
)
WHERE video_time_spent_running_sum < X
但 video_time_spent_running_sum 不够聪明,无法跳过同一视频中的其他编码。执行此操作的最佳方法是什么?
每个视频的编码数量不是恒定的。
创建脚本 table:
SELECT
*,
SUM(encoding_bytes) OVER(
ORDER BY
encoding_bytes DESC
) AS encoding_bytes_running_sum,
SUM(video_time_spent) OVER (
ORDER BY
encoding_bytes DESC ROWS UNBOUNDED PRECEDING
) AS video_time_spent_running_sum
FROM (
VALUES
('a', 1, 1, 500),
('a', 2, 1, 400),
('b', 3, 2, 300),
('b', 4, 2, 200),
('b', 5, 2, 100),
('b', 6, 2, 100)
) AS t (video, encoding, video_time_spent, encoding_bytes)
一种方法如下(我相信它可以简化);您使用 ROW_NUMBER
函数只计算每个视频的第一行。
WITH cte AS (
SELECT
*
, SUM(encoding_bytes) OVER (ORDER BY encoding_bytes DESC) AS encoding_bytes_running_sum
--, SUM(video_time_spent) OVER (ORDER BY encoding_bytes DESC ROWS UNBOUNDED PRECEDING) AS video_time_spent_running_sum
, ROW_NUMBER() OVER (PARTITION BY video ORDER BY video, [encoding]) rn
FROM (
VALUES
('a', 1, 1, 500),
('a', 2, 1, 400),
('b', 3, 2, 300),
('b', 4, 2, 200),
('b', 5, 2, 100),
('b', 6, 2, 100)
) AS t (video, [encoding], video_time_spent, encoding_bytes)
)
SELECT video, [encoding], video_time_spent, encoding_bytes, encoding_bytes_running_sum
, SUM(CASE WHEN rn = 1 THEN video_time_spent ELSE 0 END) OVER (ORDER BY video ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) video_time_spent_running_sum
FROM cte;
这个returns:
video encoding
video_time_spent
encoding_bytes
encoding_bytes_running_sum
video_time_spent_running_sum
a
1
1
500
500
a
2
1
400
900
b
3
2
300
1200
b
4
2
200
1400
b
5
2
100
1600
b
6
2
100
1600
一个例子table:
video | encoding | video time spent | encoding bytes | encoding bytes running sum | video time spent running sum (expected) | video time spent running sum (actual) |
---|---|---|---|---|---|---|
A | 1 | 1 | 500 | 500 | 1 | 1 |
A | 2 | 1 | 400 | 900 | 1 | 2 |
B | 3 | 2 | 300 | 1200 | 3 | 5 |
B | 4 | 2 | 200 | 1400 | 3 | 8 |
B | 5 | 2 | 100 | 1500 | 3 | 11 |
B | 6 | 2 | 100 | 1600 | 3 | 14 |
- video time spent 列有一个视频被观看了多少时间;观看了哪种编码并不重要。
- 花费的视频时间 运行 总和是我想要得到的。它应该只计算在视频级别花费的时间,忽略编码。
我想 select 尽可能多地编码字节,同时保持在花费的视频时间总和 < X 以内。
我目前的查询:
SELECT *
FORM (
SELECT
...,
SUM(encoding_bytes) OVER(ORDER BY encoding_bytes desc) AS encoding_bytes_running_sum,
SUM(video_time_spent) OVER (ORDER BY encoding_bytes desc) AS video_time_spent_running_sum
...
)
WHERE video_time_spent_running_sum < X
但 video_time_spent_running_sum 不够聪明,无法跳过同一视频中的其他编码。执行此操作的最佳方法是什么?
每个视频的编码数量不是恒定的。
创建脚本 table:
SELECT
*,
SUM(encoding_bytes) OVER(
ORDER BY
encoding_bytes DESC
) AS encoding_bytes_running_sum,
SUM(video_time_spent) OVER (
ORDER BY
encoding_bytes DESC ROWS UNBOUNDED PRECEDING
) AS video_time_spent_running_sum
FROM (
VALUES
('a', 1, 1, 500),
('a', 2, 1, 400),
('b', 3, 2, 300),
('b', 4, 2, 200),
('b', 5, 2, 100),
('b', 6, 2, 100)
) AS t (video, encoding, video_time_spent, encoding_bytes)
一种方法如下(我相信它可以简化);您使用 ROW_NUMBER
函数只计算每个视频的第一行。
WITH cte AS (
SELECT
*
, SUM(encoding_bytes) OVER (ORDER BY encoding_bytes DESC) AS encoding_bytes_running_sum
--, SUM(video_time_spent) OVER (ORDER BY encoding_bytes DESC ROWS UNBOUNDED PRECEDING) AS video_time_spent_running_sum
, ROW_NUMBER() OVER (PARTITION BY video ORDER BY video, [encoding]) rn
FROM (
VALUES
('a', 1, 1, 500),
('a', 2, 1, 400),
('b', 3, 2, 300),
('b', 4, 2, 200),
('b', 5, 2, 100),
('b', 6, 2, 100)
) AS t (video, [encoding], video_time_spent, encoding_bytes)
)
SELECT video, [encoding], video_time_spent, encoding_bytes, encoding_bytes_running_sum
, SUM(CASE WHEN rn = 1 THEN video_time_spent ELSE 0 END) OVER (ORDER BY video ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) video_time_spent_running_sum
FROM cte;
这个returns:
video encoding | video_time_spent | encoding_bytes | encoding_bytes_running_sum | video_time_spent_running_sum |
---|---|---|---|---|
a | 1 | 1 | 500 | 500 |
a | 2 | 1 | 400 | 900 |
b | 3 | 2 | 300 | 1200 |
b | 4 | 2 | 200 | 1400 |
b | 5 | 2 | 100 | 1600 |
b | 6 | 2 | 100 | 1600 |