雪花 - return 两个数组之间的不同（不相似）值

Question

查看 Snowflake 文档后，我发现名为 array_intersection(array_1, array_2) 的函数将 return 两个数组之间的公共值，但我需要显示数组中任何一个数组中都不存在的值.

示例 1：

假设我的 table

中有以下两个数组

array_1 = ['a', 'b', 'c', 'd', 'e']
array_2 = ['a', 'f', 'c', 'g', 'e']

我的查询：

select
  array_intersection(array_1, array_2)
from myTable

当前输出：

['a', 'c', 'e']

但我希望输出为：

['f', 'g']

示例 2：

假设我的 table

中有以下两个数组

array_1 = ['u', 'v', 'w', 'x', 'y']
array_2 = ['u', 'v', 'i', 'x', 'k']

我的查询：

select
  array_intersection(array_1, array_2)
from myTable

当前输出：

['u', 'v', 'x']

但我希望输出为：

['w', 'y', 'i', 'k']

如何在 Snowflake 中完成此操作？有什么建议吗？

Answer 1

with myTable as (
select array_construct('a', 'b', 'c', 'd', 'e') as a1
    ,array_construct('a', 'f', 'c', 'g', 'e') as a2
)
select a1, a2, array_intersection(a1, a2)
from myTable;

表明我们正在处理相同的数据。

with myTable as (
    SELECT array_construct('a', 'b', 'c', 'd', 'e') as a1
    ,array_construct('a', 'f', 'c', 'g', 'e') as a2
), seq_myTable as (
    SELECT seq8() as seq
    ,t.*
  from myTable t
), expanded_a1 as (
   select a.seq
    ,f.value as val
  from seq_myTable a, 
    lateral flatten(input => a.a1) f
), expanded_a2 as (
   select a.seq
    ,f.value as val
  from seq_myTable a, 
    lateral flatten(input => a.a2) f
)
select coalesce(a.seq,b.seq) as seq, array_agg(coalesce(a.val,b.val)) as vals
from expanded_a1 a
full outer join expanded_a2 b 
    on a.seq = b.seq and a.val = b.val
where (a.seq is null OR b.seq is null)
group by 1;

这给出了答案，但它们没有排序，为此你需要：

with myTable as (
    SELECT array_construct('a', 'b', 'c', 'd', 'e') as a1
    ,array_construct('a', 'f', 'c', 'g', 'e') as a2
), seq_myTable as (
    SELECT seq8() as seq
    ,t.*
  from myTable t
), expanded_a1 as (
   select a.seq
    ,f.value as val
  from seq_myTable a, 
    lateral flatten(input => a.a1) f
), expanded_a2 as (
   select a.seq
    ,f.value as val
  from seq_myTable a, 
    lateral flatten(input => a.a2) f
)
select array_agg(val) WITHIN GROUP ( order by val) as vals 
from (
  select coalesce(a.seq,b.seq) as seq, coalesce(a.val,b.val) as val
  from expanded_a1 a
  full outer join expanded_a2 b 
      on a.seq = b.seq and a.val = b.val
  where (a.seq is null OR b.seq is null)
)
group by seq;

给出输出[ "b", "d", "f", "g" ]

Answer 2

这道题中的集合运算就是数学家所说的disjunctive union。
在这种情况下，SQL 锤子可能无法最佳处理 ARRAY 螺钉。

一件事是让一个孤立的查询与准系统案例一起工作，另一件事是创建一个可维护的现实生活中的查询，其中还包含其他复杂性。

JavaScript 是处理集合计算的理想选择，非常适合此任务：

CREATE OR REPLACE FUNCTION ARRAY_DISJUNCTIVE_UNION(A1 ARRAY, A2 ARRAY)
RETURNS ARRAY LANGUAGE JAVASCRIPT AS
'return [...A1.filter(e => !A2.includes(e)),...A2.filter(e => !A1.includes(e))]';

Answer 3

我不确定您是否需要像其他答案那样的序列。这工作得非常干净：

with myTable as (
select array_construct('a', 'b', 'c', 'd', 'e') as a1
    ,array_construct('a', 'f', 'c', 'g', 'e') as a2
)
SELECT array_agg(coalesce(a1.value,a2.value)) WITHIN GROUP (ORDER BY coalesce(a1.value,a2.value)) as newarray
FROM (
  SELECT *
  FROM myTable,
  lateral flatten(input => a1) a1
  ) a1
FULL OUTER JOIN (
  SELECT *
  FROM myTable,
  lateral flatten(input => a2) a2
  ) a2
ON a1.value::varchar = a2.value::varchar
WHERE a1.value IS NULL
   OR a2.value IS NULL
;

Answer 4

您可以使用横向展平将数组转换为结果集，然后使用 SQL 集合操作比较元素，然后将结果转换回数组。

CTE（WITH 子句）的想法允许您在一个语句中完成所有这些。

create or replace table arrays as (select split('1,2,3',',') a, split('4,9,1,3',',') b); //some test data

with a_elements as
(select value from arrays,   
lateral flatten(input => arrays.a) f ) // array to result set
, b_elements as 
(select value from arrays,   
lateral flatten(input => arrays.b) f )
select array_agg(value) // reassemble the array
from (select * from a_elements minus select * from b_elements);

雪花 - return 两个数组之间的不同（不相似）值

Snowflake - return different (not similar) values between two arrays

snowflake-cloud-data-platform