Snowflake 中基于条件插入的模糊逻辑

Question

我有一个table如下

Company Name	ID
Facebook	32
Google	33
Apple	44

因此，如果我获得公司名称为“Facebook Inc”或“Facebook Company”的新记录，它应该忽略它应该插入的其他内容。逻辑条件应该是什么？

insert into Table a 在哪里？（模糊逻辑）

Answer 1

对于问题中描述的逻辑，解决这个问题的一个简单方法是 merge:

应用“模糊逻辑”寻找匹配项。在这种情况下，它是一个比较字符串第一个单词的正则表达式：regexp_substr(a.company, '^[^ ]+') = regexp_substr(b.company, '^[^ ]+')
如果匹配，什么都不做（and false）。
如果不匹配，插入：

merge into companies a
using (
    select 'Facebook Inc' company, 10 id
) as b on regexp_substr(a.company, '^[^ ]+') = regexp_substr(b.company, '^[^ ]+')
when matched and false then update set a.id = b.id
when not matched then insert (company, id) values (b.company, b.id)

设置：

create or replace temp table companies as
select ::string company, ::int id
from values ('Google', 1), ('Facebook', 2), ('Apple', 3);

如果想定义更复杂的“模糊逻辑”，请开新题。

Snowflake 中基于条件插入的模糊逻辑

Condition Insert based Fuzzy Logic in Snowflake

snowflake-cloud-data-platform