BQ - 在没有连接的情况下从结构数组中获取一个字段

BQ - getting a field from an array of structs without join

我有一个 table 包含以下列:

items ARRAY<STRUCT<label STRING, counter INTEGER>>
explore BOOLEAN

对于每条记录,我想选择计数器最高的标签,然后在每个唯一标签上计数 explore。 理想情况下,我想要 运行 类似的东西:

SELECT FIRST_VALUE(items.label) OVER (ORDER BY items.counter DESC) as label,
       COUNT(explore) as explore
FROM my_table
GROUP BY 1

如果这是我table中的数据:

explore       items
   1      [['A',1],['B',3]]
   1      [['B',1]]
   0.     [['C',2],['D',1]]

那我想得到:

label  explore
 'B'      2
 'C'      1

使用您的示例数据,考虑以下方法。

with data as (
    select 1 as explore, [STRUCT( 'A' as label, 1 as counter), STRUCT( 'B' as label, 3 as counter) ] as items,
    union all select 1 as explore, [STRUCT( 'B' as label, 1 as counter)] as items,
    union all select 0 as explore, [STRUCT( 'C' as label, 2 as counter), STRUCT( 'D' as label, 1 as counter) ] as items
),

add_row_num as (
SELECT 
        explore,
        items,
        row_number() over (order by explore desc) as row_number
FROM data
),

get_highest_label as (
select 
    explore,
    row_number,
    label,
    counter,
    first_value(label) over (partition by row_number order by counter desc) as highest_label_per_row 
from add_row_num, unnest(items)
),

--  (REMOVE DUPLICATE)
remove_dups as (
  SELECT
      *,
      ROW_NUMBER()
          OVER (PARTITION BY row_number) as new_row_number
  FROM get_highest_label
)

select 
    highest_label_per_row,
    count(highest_label_per_row) as explore,
from remove_dups 

where new_row_number = 1
group by highest_label_per_row


输出:

考虑以下方法

select ( select label from t.items
    order by counter desc limit 1
  ) label, 
  count(*) explore
from your_table t
group by label           

如果应用于您问题中的样本数据

with your_table as (
    select 1 explore, [struct('A' as label, 1 as counter), struct('B' as label, 3 as counter) ] items union all 
    select 1, [struct('B', 1)] union all 
    select 0, [struct('C', 2), struct('D', 1) ] 
)

输出是