运行 排除具有重复列值的行的总和

Running sum excluding rows with duplicate column value

一个例子table:

video encoding video time spent encoding bytes encoding bytes running sum video time spent running sum (expected) video time spent running sum (actual)
A 1 1 500 500 1 1
A 2 1 400 900 1 2
B 3 2 300 1200 3 5
B 4 2 200 1400 3 8
B 5 2 100 1500 3 11
B 6 2 100 1600 3 14

我想 select 尽可能多地编码字节,同时保持在花费的视频时间总和 < X 以内。

我目前的查询:

SELECT *
FORM (
   SELECT 
      ...,
      SUM(encoding_bytes) OVER(ORDER BY encoding_bytes desc) AS encoding_bytes_running_sum, 
      SUM(video_time_spent) OVER (ORDER BY encoding_bytes desc) AS video_time_spent_running_sum
      ...
) 
WHERE video_time_spent_running_sum < X

但 video_time_spent_running_sum 不够聪明,无法跳过同一视频中的其他编码。执行此操作的最佳方法是什么?

每个视频的编码数量不是恒定的。

创建脚本 table:

SELECT
    *,
    SUM(encoding_bytes) OVER(
        ORDER BY
            encoding_bytes DESC
    ) AS encoding_bytes_running_sum,
    SUM(video_time_spent) OVER (
        ORDER BY
            encoding_bytes DESC ROWS UNBOUNDED PRECEDING
    ) AS video_time_spent_running_sum
FROM (
    VALUES
        ('a', 1, 1, 500),
        ('a', 2, 1, 400),
        ('b', 3, 2, 300),
        ('b', 4, 2, 200),
        ('b', 5, 2, 100),
        ('b', 6, 2, 100)
) AS t (video, encoding, video_time_spent, encoding_bytes)

一种方法如下(我相信它可以简化);您使用 ROW_NUMBER 函数只计算每个视频的第一行。

WITH cte AS (
    SELECT
        *
        , SUM(encoding_bytes) OVER (ORDER BY encoding_bytes DESC) AS encoding_bytes_running_sum
        --, SUM(video_time_spent) OVER (ORDER BY encoding_bytes DESC ROWS UNBOUNDED PRECEDING) AS video_time_spent_running_sum
        , ROW_NUMBER() OVER (PARTITION BY video ORDER BY video, [encoding]) rn
    FROM (
        VALUES
            ('a', 1, 1, 500),
            ('a', 2, 1, 400),
            ('b', 3, 2, 300),
            ('b', 4, 2, 200),
            ('b', 5, 2, 100),
            ('b', 6, 2, 100)
    ) AS t (video, [encoding], video_time_spent, encoding_bytes)
)
SELECT video, [encoding], video_time_spent, encoding_bytes, encoding_bytes_running_sum
    , SUM(CASE WHEN rn = 1 THEN video_time_spent ELSE 0 END) OVER (ORDER BY video ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) video_time_spent_running_sum
FROM cte;

这个returns:

video encoding video_time_spent encoding_bytes encoding_bytes_running_sum video_time_spent_running_sum
a 1 1 500 500
a 2 1 400 900
b 3 2 300 1200
b 4 2 200 1400
b 5 2 100 1600
b 6 2 100 1600