通过按降序分配值来遍历 Hive 或 SQL 行

Question

对于具有 5 行的配置单元或 sql table，我如何拆分值 = 12，并按降序将其分配给行。比如下面提到的table，其中column(total)的总和为12，按降序赋值。

column_1    column_2    total
   a           b          3
   c           d          3
   e           f          2
   g           h          2
   i           j          2

Answer 1

您可以使用 hivevar 变量参数化此查询。我用 total=12、11、16 的几个不同值对其进行了测试，它似乎工作正常。请自行优化调试。我只是提供了想法：

with vars as(--calculate min_value, max_value and how many rows with max_value it should be (max_rows)
select
       ceil(total/num_rows)                 as max_value, 
       floor(total/num_rows)                as min_value,
       total-floor(total/num_rows)*num_rows as max_rows
from
(select 5 num_rows, 12 total)s --your variables, parametrize using hivevar variables
),

your_table as (--use your table instead of this
select stack(5,
   'a', 'b',
   'c', 'd',
   'e', 'f',
   'g', 'h',
   'i', 'j'
) as (column_1,column_2)
)-- this is your_table, suppose column_1 determines the order of rows

select column_1,column_2, case when rn<=max_rows then max_value else min_value end as total
       --, rn, min_value, max_value, max_rows --debug values
from
(
select t.*, row_number() over(order by column_1) rn,  
       v.min_value, 
       v.max_value, 
       v.max_rows
 from your_table t
      cross join vars v
)s;

结果：

column_1    column_2    total   
a   b   3   
c   d   3   
e   f   2   
g   h   2   
i   j   2

总计=11 returns:

column_1    column_2    total   
a   b   3   
c   d   2   
e   f   2   
g   h   2   
i   j   2

总共=16个returns:

column_1    column_2    total   
a   b   4   
c   d   3   
e   f   3   
g   h   3   
i   j   3

当然，它仍然可能存在一些错误，需要在用作核反应堆控制组件之前仔细测试。未在初始 table 中使用不同的行数进行测试。但它绝对适用于您问题中的初始条件。

还可以通过计算 table 查询中的行数 count(*) over() as num_rows 来进行优化，并且只对一个参数进行参数化：总计（在您的示例中为 12）。计算max_value、min_value和max_rows的逻辑可以从your_table移到同一个查询中，没有交叉连接，你可以在没有vars子查询的情况下做同样的事情。

通过按降序分配值来遍历 Hive 或 SQL 行

Loop through Hive or SQL Rows by assigning values in descending order

sql

hive

hiveql