给定 Snowflake 中的两个数组,找到最小元素的总和
Given two arrays in Snowflake, find summation of minimum elements
给定两个数组,如 a = [10, 20, 30] 和 b = [9, 21, 32],我如何根据雪花中的索引构造一个由最小或最大元素组成的数组,即所需的最小输出是 [9,20,30],最大值是 [10,21,32]?
我查看了 snowflake 的数组函数,但没有找到执行此操作的函数。
使用数字 table/[]
访问元素并使用 ARRAY_AGG
构建新数组:
WITH cte AS (
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32) AS b
), numbers AS (
SELECT ROW_NUMBER() OVER(ORDER BY seq4())-1 AS IND
FROM TABLE(GENERATOR(ROWCOUNT => 10001))
)
SELECT a,b
,ARRAY_AGG(LEAST(a[ind], b[ind])) WITHIN GROUP(ORDER BY n.ind) AS min_array
,ARRAY_AGG(GREATEST(a[ind], b[ind])) WITHIN GROUP(ORDER BY n.ind) AS max_array
FROM cte
JOIN numbers n
ON n.ind < GREATEST(ARRAY_SIZE(a), ARRAY_SIZE(b))
GROUP BY a,b;
输出:
如果数组的大小始终相同(并重复使用 Lukasz 大数据 cte):
WITH cte AS (
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32) AS b
)
SELECT a,b
,ARRAY_AGG(LEAST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS min_array
,ARRAY_AGG(GREATEST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS max_array
FROM cte
,table(flatten(a)) n
GROUP BY 1,2;
给出:
A
B
MIN_ARRAY
MAX_ARRAY
[ 10, 20, 30 ]
[ 9, 21, 32 ]
[ 9, 20, 30 ]
[ 10, 21, 32 ]
如果您的列表不均匀:
WITH cte AS (
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32) AS b
union all
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32, 45) AS b
)
SELECT a,b
,ARRAY_AGG(LEAST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS min_array
,ARRAY_AGG(GREATEST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS max_array
FROM cte
,table(flatten(iff(array_size(a)>=array_size(b), a, b))) n
GROUP BY 1,2;
A
B
MIN_ARRAY
MAX_ARRAY
[ 10, 20, 30 ]
[ 9, 21, 32 ]
[ 9, 20, 30 ]
[ 10, 21, 32 ]
[ 10, 20, 30 ]
[ 9, 21, 32, 45 ]
[ 9, 20, 30 ]
[ 10, 21, 32 ]
会选择最大的,但是考虑到较小列表中的 NULL 会导致 LEAST/GREATEST 到 return NULL 并且 ARRAY_AGG 会丢弃空值,您甚至不需要进行大小比较,除非您想 NVL/COALESCE 将该值设置为空值的安全值。
SELECT 1 as a, null as b, least(a,b);
给出:
A
B
LEAST(A,B)
1
null
null
像这样:
SELECT a,b
,ARRAY_AGG(LEAST(nvl(a[n.index],10000), nvl(b[n.index],10000))) WITHIN GROUP(ORDER BY n.index) AS min_array
,ARRAY_AGG(GREATEST(nvl(a[n.index],0), nvl(b[n.index],0))) WITHIN GROUP(ORDER BY n.index) AS max_array
FROM cte
,table(flatten(iff(array_size(a)>=array_size(b), a, b))) n
GROUP BY 1,2;
A
B
MIN_ARRAY
MAX_ARRAY
[ 10, 20, 30 ]
[ 9, 21, 32 ]
[ 9, 20, 30 ]
[ 10, 21, 32 ]
[ 10, 20, 30 ]
[ 9, 21, 32, 45 ]
[ 9, 20, 30, 45 ]
[ 10, 21, 32, 45 ]
给定两个数组,如 a = [10, 20, 30] 和 b = [9, 21, 32],我如何根据雪花中的索引构造一个由最小或最大元素组成的数组,即所需的最小输出是 [9,20,30],最大值是 [10,21,32]?
我查看了 snowflake 的数组函数,但没有找到执行此操作的函数。
使用数字 table/[]
访问元素并使用 ARRAY_AGG
构建新数组:
WITH cte AS (
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32) AS b
), numbers AS (
SELECT ROW_NUMBER() OVER(ORDER BY seq4())-1 AS IND
FROM TABLE(GENERATOR(ROWCOUNT => 10001))
)
SELECT a,b
,ARRAY_AGG(LEAST(a[ind], b[ind])) WITHIN GROUP(ORDER BY n.ind) AS min_array
,ARRAY_AGG(GREATEST(a[ind], b[ind])) WITHIN GROUP(ORDER BY n.ind) AS max_array
FROM cte
JOIN numbers n
ON n.ind < GREATEST(ARRAY_SIZE(a), ARRAY_SIZE(b))
GROUP BY a,b;
输出:
如果数组的大小始终相同(并重复使用 Lukasz 大数据 cte):
WITH cte AS (
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32) AS b
)
SELECT a,b
,ARRAY_AGG(LEAST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS min_array
,ARRAY_AGG(GREATEST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS max_array
FROM cte
,table(flatten(a)) n
GROUP BY 1,2;
给出:
A | B | MIN_ARRAY | MAX_ARRAY |
---|---|---|---|
[ 10, 20, 30 ] | [ 9, 21, 32 ] | [ 9, 20, 30 ] | [ 10, 21, 32 ] |
如果您的列表不均匀:
WITH cte AS (
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32) AS b
union all
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32, 45) AS b
)
SELECT a,b
,ARRAY_AGG(LEAST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS min_array
,ARRAY_AGG(GREATEST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS max_array
FROM cte
,table(flatten(iff(array_size(a)>=array_size(b), a, b))) n
GROUP BY 1,2;
A | B | MIN_ARRAY | MAX_ARRAY |
---|---|---|---|
[ 10, 20, 30 ] | [ 9, 21, 32 ] | [ 9, 20, 30 ] | [ 10, 21, 32 ] |
[ 10, 20, 30 ] | [ 9, 21, 32, 45 ] | [ 9, 20, 30 ] | [ 10, 21, 32 ] |
会选择最大的,但是考虑到较小列表中的 NULL 会导致 LEAST/GREATEST 到 return NULL 并且 ARRAY_AGG 会丢弃空值,您甚至不需要进行大小比较,除非您想 NVL/COALESCE 将该值设置为空值的安全值。
SELECT 1 as a, null as b, least(a,b);
给出:
A | B | LEAST(A,B) |
---|---|---|
1 | null | null |
像这样:
SELECT a,b
,ARRAY_AGG(LEAST(nvl(a[n.index],10000), nvl(b[n.index],10000))) WITHIN GROUP(ORDER BY n.index) AS min_array
,ARRAY_AGG(GREATEST(nvl(a[n.index],0), nvl(b[n.index],0))) WITHIN GROUP(ORDER BY n.index) AS max_array
FROM cte
,table(flatten(iff(array_size(a)>=array_size(b), a, b))) n
GROUP BY 1,2;
A | B | MIN_ARRAY | MAX_ARRAY |
---|---|---|---|
[ 10, 20, 30 ] | [ 9, 21, 32 ] | [ 9, 20, 30 ] | [ 10, 21, 32 ] |
[ 10, 20, 30 ] | [ 9, 21, 32, 45 ] | [ 9, 20, 30, 45 ] | [ 10, 21, 32, 45 ] |