在 ID 上加入,如果 ID 不匹配则在其他列上匹配 BigQuery
JOIN on ID, IF ID doesn't match then match on other columns BigQuery
我想加入两个 table。 table 有主键和外键,但在某些情况下键不匹配,我需要加入下一个最佳匹配。
我尝试使用 case 语句并且它有效,但因为连接并不完美。它将获取不正确的值或复制记录。
table 的工作方式是,如果 Info_ID
不匹配,我们可以使用 Lev1
的组合,如果 cust_start
日期介于Info_Start
和 Info_End
我需要一种方法来匹配 ID,然后 SQL 停止匹配该行。但我不确定 BigQuery 是否可以做到这一点。
客户Table
Cust_ID Cust_InfoID Cust_name Cust_Start Cust_Lev1
1111 1 Amy 2021-01-01 A
1112 3 John 2020-01-01 D
1113 8 Bill 2020-01-01 D
信息Table
Info_ID Info_Lev1 Info_Start Info_End state
1 A 2021-01-15 2021-01-14 NJ
3 D 2020-01-01 2020-12-31 NY
5 A 2021-01-01 2022-01-31 CA
预期结果
Cust_ID Cust_InfoID Info_ID Cust_Lev1 Cust_Start Info_Start Info_End state
1111 1 1 A 2021-01-01 2021-01-15 2021-01-14 NJ
1112 3 3 D 2020-01-01 2020-01-01 2020-12-31 NY
1112 8 3 D 2020-01-01 2020-01-01 2020-12-31 NY
加入想法 1:
CASE
WHEN
(Cust_InfoID = Info_ID) = true
AND (Cust_Start BETWEEN Info_Start AND Info_End) = true
THEN
Cust_InfoID = Info_ID
ELSE
Cust_Start BETWEEN Info_Start AND Info_End
and Info_Lev1 = Cust_Lev1
END
Output:
Cust_ID Cust_InfoID Info_ID Cust_Lev1 Cust_Start Info_Start Info_End state
1111 1 5 A 2021-01-01 2021-01-01 2022-01-31 CA
1112 3 3 D 2020-01-01 2020-01-01 2020-12-31 NY
1113 8 3 D 2020-01-01 2020-01-01 2020-12-31 NY
这里的问题是 ID 匹配但日期不匹配,因此它使用 ELSE 语句进行连接。这是不正确的
加入想法 2:
CASE
WHEN
Cust_InfoID = Info_ID
THEN
Cust_InfoID = Info_ID
ELSE
Cust_Start BETWEEN Info_Start AND Info_End
and Info_Lev1 = Cust_Lev1
END
Output:
Cust_ID Cust_InfoID Info_ID Cust_Lev1 Cust_Start Info_Start Info_End state
1111 1 1 A 2021-01-01 2021-01-15 2021-01-14 NJ
1111 1 5 A 2021-01-01 2021-01-01 2022-01-31 CA
1112 3 3 D 2020-01-01 2020-01-01 2020-12-31 NY
1113 8 3 D 2020-01-01 2020-01-01 2020-12-31 NY
这里的问题是 ID 匹配但是 ELSE 语句也匹配了错误的重复行。这也是不正确的
此处示例 table:
with customer as (
SELECT 1111 Cust_ID,1 Cust_InfoID,'Amy' Cust_name,'2021-01-01' Cust_Start,'A' Cust_Lev1
UNION ALL
SELECT 1112,3,'John','2020-01-01','D'
union all
SELECT 1113,8,'Bill','2020-01-01','D'
),
info as (
select 1 Info_ID,'A' Info_Lev1,'2021-01-15' Info_Start,'2021-01-14' Info_End,'NJ' state
union all
select 3,'D','2020-01-01','2020-12-31','NY'
union all
select 5,'A','2021-01-01','2022-01-31','CA'
)
select Cust_ID,Cust_InfoID,Info_ID,Cust_Lev1,Cust_Start,Info_Start,Info_End,state
from customer
join info on
[case statement here]
The way the table works is if the Info_IDs don't match up we can use a combination of Lev1 and if the cust_start date is between Info_Start and Info_End
使用两个 left join
,每个条件一个:
select c.*,
coalesce(ii.info_start, il.info_start),
coalesce(ii.info_end, il.info_end),
coalesce(ii.state, il.state)
from customer c left join
info ii
on c.cust_infoid = ii.info_id left join
info il
on ii.info_id is null and
c.cust_lev1 = il.info_lev1 and
c.cust_start between il.info_start and il.info_end
考虑以下内容(按要求“使用一个 JOIN 和一个 CASE 语句”)
select any_value(c).*,
array_agg(i order by
case when c.cust_infoid = i.info_id then 1 else 2 end
limit 1
)[offset(0)].*
from `project.dataset.customer` c
join `project.dataset.info` i
on c.cust_infoid = i.info_id
or(
c.cust_lev1 = i.info_lev1 and
c.cust_start between i.info_start and i.info_end
)
group by format('%t', c)
应用于问题中的示例数据时 - 输出为
我想加入两个 table。 table 有主键和外键,但在某些情况下键不匹配,我需要加入下一个最佳匹配。
我尝试使用 case 语句并且它有效,但因为连接并不完美。它将获取不正确的值或复制记录。
table 的工作方式是,如果 Info_ID
不匹配,我们可以使用 Lev1
的组合,如果 cust_start
日期介于Info_Start
和 Info_End
我需要一种方法来匹配 ID,然后 SQL 停止匹配该行。但我不确定 BigQuery 是否可以做到这一点。
客户Table
Cust_ID Cust_InfoID Cust_name Cust_Start Cust_Lev1
1111 1 Amy 2021-01-01 A
1112 3 John 2020-01-01 D
1113 8 Bill 2020-01-01 D
信息Table
Info_ID Info_Lev1 Info_Start Info_End state
1 A 2021-01-15 2021-01-14 NJ
3 D 2020-01-01 2020-12-31 NY
5 A 2021-01-01 2022-01-31 CA
预期结果
Cust_ID Cust_InfoID Info_ID Cust_Lev1 Cust_Start Info_Start Info_End state
1111 1 1 A 2021-01-01 2021-01-15 2021-01-14 NJ
1112 3 3 D 2020-01-01 2020-01-01 2020-12-31 NY
1112 8 3 D 2020-01-01 2020-01-01 2020-12-31 NY
加入想法 1:
CASE
WHEN
(Cust_InfoID = Info_ID) = true
AND (Cust_Start BETWEEN Info_Start AND Info_End) = true
THEN
Cust_InfoID = Info_ID
ELSE
Cust_Start BETWEEN Info_Start AND Info_End
and Info_Lev1 = Cust_Lev1
END
Output:
Cust_ID Cust_InfoID Info_ID Cust_Lev1 Cust_Start Info_Start Info_End state
1111 1 5 A 2021-01-01 2021-01-01 2022-01-31 CA
1112 3 3 D 2020-01-01 2020-01-01 2020-12-31 NY
1113 8 3 D 2020-01-01 2020-01-01 2020-12-31 NY
这里的问题是 ID 匹配但日期不匹配,因此它使用 ELSE 语句进行连接。这是不正确的
加入想法 2:
CASE
WHEN
Cust_InfoID = Info_ID
THEN
Cust_InfoID = Info_ID
ELSE
Cust_Start BETWEEN Info_Start AND Info_End
and Info_Lev1 = Cust_Lev1
END
Output:
Cust_ID Cust_InfoID Info_ID Cust_Lev1 Cust_Start Info_Start Info_End state
1111 1 1 A 2021-01-01 2021-01-15 2021-01-14 NJ
1111 1 5 A 2021-01-01 2021-01-01 2022-01-31 CA
1112 3 3 D 2020-01-01 2020-01-01 2020-12-31 NY
1113 8 3 D 2020-01-01 2020-01-01 2020-12-31 NY
这里的问题是 ID 匹配但是 ELSE 语句也匹配了错误的重复行。这也是不正确的
此处示例 table:
with customer as (
SELECT 1111 Cust_ID,1 Cust_InfoID,'Amy' Cust_name,'2021-01-01' Cust_Start,'A' Cust_Lev1
UNION ALL
SELECT 1112,3,'John','2020-01-01','D'
union all
SELECT 1113,8,'Bill','2020-01-01','D'
),
info as (
select 1 Info_ID,'A' Info_Lev1,'2021-01-15' Info_Start,'2021-01-14' Info_End,'NJ' state
union all
select 3,'D','2020-01-01','2020-12-31','NY'
union all
select 5,'A','2021-01-01','2022-01-31','CA'
)
select Cust_ID,Cust_InfoID,Info_ID,Cust_Lev1,Cust_Start,Info_Start,Info_End,state
from customer
join info on
[case statement here]
The way the table works is if the Info_IDs don't match up we can use a combination of Lev1 and if the cust_start date is between Info_Start and Info_End
使用两个 left join
,每个条件一个:
select c.*,
coalesce(ii.info_start, il.info_start),
coalesce(ii.info_end, il.info_end),
coalesce(ii.state, il.state)
from customer c left join
info ii
on c.cust_infoid = ii.info_id left join
info il
on ii.info_id is null and
c.cust_lev1 = il.info_lev1 and
c.cust_start between il.info_start and il.info_end
考虑以下内容(按要求“使用一个 JOIN 和一个 CASE 语句”)
select any_value(c).*,
array_agg(i order by
case when c.cust_infoid = i.info_id then 1 else 2 end
limit 1
)[offset(0)].*
from `project.dataset.customer` c
join `project.dataset.info` i
on c.cust_infoid = i.info_id
or(
c.cust_lev1 = i.info_lev1 and
c.cust_start between i.info_start and i.info_end
)
group by format('%t', c)
应用于问题中的示例数据时 - 输出为