hive top K sum() 每组按键记录

hive top K sum() records per group by key

对于包含 A、B、C 列的 table TBL,我想分组依据 select A、B,其中我只取 B 的前 K 个值非常总和(C)

没有上限,这是:

select A, B, sum(C) from TBL group by A, B

具有值

A | B | C
--+---+----
a | 1 | 10
a | 2 | 20
a | 1 | 5
a | 3 | 12
b | 3 | 100
b | 2 | 90
b | 1 | 120
c | 5 | 10

并且限制为 2,结果将是

A | B | sum(C)
--+---+-------
a | 1 | 15
a | 2 | 20
b | 1 | 120
b | 3 | 100
c | 5 | 10

您可以使用 windowing functions 来实现。

查询:

SELECT a, b, c
FROM (
  SELECT *
    , ROW_NUMBER() OVER (PARTITION BY a ORDER BY c DESC) AS rank
  FROM (
    SELECT A   AS a
      , B      AS b
      , SUM(C) AS c
    FROM db.table
    GROUP BY A, B ) x ) y
WHERE rank < 3

输出:

a       b       c
a       2       20
a       1       15
b       1       120
b       3       100
c       5       10
select      A
           ,B
           ,sum_C

from       (select      A
                       ,B
                       ,sum(C) as sum_C

                       ,row_number () over
                        (
                            partition by    A
                            order by        sum(C) desc
                        ) as rn

            from        TBL 

            group by    A
                       ,B
            ) t

where       rn <= 2

+---+---+-------+
| a | b | sum_c |
+---+---+-------+
| a | 2 |    20 |
| a | 1 |    15 |
| b | 1 |   120 |
| b | 3 |   100 |
| c | 5 |    10 |
+---+---+-------+