如何在 Snowsql GroupBy 中组合数组并只保留不同的值?
How to Combine Arrays in Snowsql GroupBy and Only Keep Distinct Values?
我正在尝试 运行 列出用户为我的数据集提交的所有不同市场的查询。
“市场”列中的值已经是数组格式。当我 运行 下面的查询时,我得到一个数组数组,并且某些市场可能会被多次列出,因为 distinct 子句查看的是唯一数组而不是数组中的值。
例如,如果我尝试将 ['New York']
和 ['New York' , 'Chicago']
分组,我的目标是得到 ['New York', 'Chicago']
作为我的结果,但我目前得到 [['New York'],['New York', 'Chicago']]
。感谢任何帮助。
SELECT
s.submitter_id,
ARRAY_AGG(DISTINCT s.markets)
FROM
analytics.submissions AS s
GROUP BY 1
一个简单的方法是先展平数组
WITH data AS (
SELECT submitter_id, split(markets,';') AS markets
FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT
a.submitter_id,
ARRAY_AGG(DISTINCT a.market)
FROM (
SELECT s.submitter_id
,f.value AS market
FROM data AS s,
LATERAL FLATTEN(input => s.markets) f
) AS a
GROUP BY 1;
使用 javascript UDF 的变体:
WITH data AS (
SELECT submitter_id, split(markets,';') AS markets
FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT
submitter_id,
array_flat_distinct(ARRAY_AGG(distinct markets))
from data
group by 1;
其中 UDF 定义为:
create or replace function array_flat_distinct("a" array)
returns array
language javascript
as
$$
return [...new Set(a.reduce((b,c)=>[...b,...c]))]
$$
;
可以使用 ARRAY_UNION_AGG 聚合函数轻松实现,它完全符合问题中的要求:
Returns an ARRAY that contains the union of the distinct values from the input ARRAYs in a column.
示例数据:
CREATE TABLE submissions AS
SELECT 1 AS submitter_id, ['New York' , 'Chicago'] AS markets
UNION ALL SELECT 1 AS submitter_id, ['New York'] AS markets;
查询:
SELECT submitter_id, ARRAY_UNION_AGG(markets) AS markets
FROM submissions
GROUP BY submitter_id;
输出:
我正在尝试 运行 列出用户为我的数据集提交的所有不同市场的查询。
“市场”列中的值已经是数组格式。当我 运行 下面的查询时,我得到一个数组数组,并且某些市场可能会被多次列出,因为 distinct 子句查看的是唯一数组而不是数组中的值。
例如,如果我尝试将 ['New York']
和 ['New York' , 'Chicago']
分组,我的目标是得到 ['New York', 'Chicago']
作为我的结果,但我目前得到 [['New York'],['New York', 'Chicago']]
。感谢任何帮助。
SELECT
s.submitter_id,
ARRAY_AGG(DISTINCT s.markets)
FROM
analytics.submissions AS s
GROUP BY 1
一个简单的方法是先展平数组
WITH data AS (
SELECT submitter_id, split(markets,';') AS markets
FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT
a.submitter_id,
ARRAY_AGG(DISTINCT a.market)
FROM (
SELECT s.submitter_id
,f.value AS market
FROM data AS s,
LATERAL FLATTEN(input => s.markets) f
) AS a
GROUP BY 1;
使用 javascript UDF 的变体:
WITH data AS (
SELECT submitter_id, split(markets,';') AS markets
FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT
submitter_id,
array_flat_distinct(ARRAY_AGG(distinct markets))
from data
group by 1;
其中 UDF 定义为:
create or replace function array_flat_distinct("a" array)
returns array
language javascript
as
$$
return [...new Set(a.reduce((b,c)=>[...b,...c]))]
$$
;
可以使用 ARRAY_UNION_AGG 聚合函数轻松实现,它完全符合问题中的要求:
Returns an ARRAY that contains the union of the distinct values from the input ARRAYs in a column.
示例数据:
CREATE TABLE submissions AS
SELECT 1 AS submitter_id, ['New York' , 'Chicago'] AS markets
UNION ALL SELECT 1 AS submitter_id, ['New York'] AS markets;
查询:
SELECT submitter_id, ARRAY_UNION_AGG(markets) AS markets
FROM submissions
GROUP BY submitter_id;
输出: