如何在 Snowsql GroupBy 中组合数组并只保留不同的值?

How to Combine Arrays in Snowsql GroupBy and Only Keep Distinct Values?

我正在尝试 运行 列出用户为我的数据集提交的所有不同市场的查询。

“市场”列中的值已经是数组格式。当我 运行 下面的查询时,我得到一个数组数组,并且某些市场可能会被多次列出,因为 distinct 子句查看的是唯一数组而不是数组中的值。

例如,如果我尝试将 ['New York']['New York' , 'Chicago'] 分组,我的目标是得到 ['New York', 'Chicago'] 作为我的结果,但我目前得到 [['New York'],['New York', 'Chicago']]。感谢任何帮助。

SELECT 
  s.submitter_id,
  ARRAY_AGG(DISTINCT s.markets)
FROM 
  analytics.submissions AS s
GROUP BY 1

一个简单的方法是先展平数组

WITH data AS (
 SELECT submitter_id, split(markets,';') AS markets 
 FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT 
  a.submitter_id,
  ARRAY_AGG(DISTINCT a.market)
FROM (
    SELECT s.submitter_id
        ,f.value AS market
    FROM data AS s,
    LATERAL FLATTEN(input => s.markets) f
) AS a
GROUP BY 1;

使用 javascript UDF 的变体:

WITH data AS (
 SELECT submitter_id, split(markets,';') AS markets 
 FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT 
  submitter_id,
  array_flat_distinct(ARRAY_AGG(distinct markets))
from data
group by 1;

其中 UDF 定义为:

create or replace function array_flat_distinct("a" array)
returns array
language javascript
as
$$
    return [...new Set(a.reduce((b,c)=>[...b,...c]))]
$$
;

可以使用 ARRAY_UNION_AGG 聚合函数轻松实现,它完全符合问题中的要求:

Returns an ARRAY that contains the union of the distinct values from the input ARRAYs in a column.

示例数据:

CREATE TABLE submissions  AS
          SELECT 1 AS submitter_id, ['New York' , 'Chicago'] AS markets
UNION ALL SELECT 1 AS submitter_id, ['New York']             AS markets;

查询:

SELECT submitter_id, ARRAY_UNION_AGG(markets) AS markets
FROM submissions
GROUP BY submitter_id;

输出: