如何在 Hive 中合并两个 table 以获得单个 table
How to combine two tables to get singel table in Hive
我有以下表格,需要将它们合并到配置单元中
任何人都可以帮助我如何实现这一目标。我用 coalesce 尝试了日期部分,没问题。
但是fam部分无法合并成单列。
非常感谢您的帮助。
谢谢,巴布
分两步使用全外连接,例如
with join1 as (
select coalesce(t1.date, t2.date) as date
, coalesce(fam1, fam2) as fam
, coalesce(famcnt1, 0) as famsct1
, coalesce(famcnt2, 0) as famsct2
from table1 as t1
full outer join table2 as t2
on (t1.date = t2.date and fam1 = fam2)
)
select coalesce(t1.date, t3.date) as date
, coalesce(fam, fam3) as fam
, coalesce(famcnt1, 0) as famsct1
, coalesce(famcnt2, 0) as famsct2
, coalesce(famcnt3, 0) as famsct3
from join1 as t1
full outer join table3 as t3
on (t1.date = t3.date and fam = fam3)
您可以使用 full outer join
。但是,带有 left join
的 union
通常看起来更干净:
select df.date, df.name,
coalesce(t1.famcnt1, 0) as famcnt1,
coalesce(t2.famcnt2, 0) as famcnt2,
coalesce(t3.famcnt3, 0) as famcnt3
from ((select date, fam1 from table1
) union -- on purpose to remove duplicates
(select date, fam1 from table2
) union -- on purpose to remove duplicates
(select date, fam1 from table3
)
) df left join
table1 t1
on t1.date = df.date and t1.name = df.name left join
table2 t2
on t2.date = df.date and t2.name = df.name left join
table3 t3
on t3.date = df.date and t3.name = df.name;
如果您对 NULL
而不是 0
感到满意,那么根本不需要 COALESCE()
。
我有以下表格,需要将它们合并到配置单元中
任何人都可以帮助我如何实现这一目标。我用 coalesce 尝试了日期部分,没问题。 但是fam部分无法合并成单列。
非常感谢您的帮助。
谢谢,巴布
分两步使用全外连接,例如
with join1 as (
select coalesce(t1.date, t2.date) as date
, coalesce(fam1, fam2) as fam
, coalesce(famcnt1, 0) as famsct1
, coalesce(famcnt2, 0) as famsct2
from table1 as t1
full outer join table2 as t2
on (t1.date = t2.date and fam1 = fam2)
)
select coalesce(t1.date, t3.date) as date
, coalesce(fam, fam3) as fam
, coalesce(famcnt1, 0) as famsct1
, coalesce(famcnt2, 0) as famsct2
, coalesce(famcnt3, 0) as famsct3
from join1 as t1
full outer join table3 as t3
on (t1.date = t3.date and fam = fam3)
您可以使用 full outer join
。但是,带有 left join
的 union
通常看起来更干净:
select df.date, df.name,
coalesce(t1.famcnt1, 0) as famcnt1,
coalesce(t2.famcnt2, 0) as famcnt2,
coalesce(t3.famcnt3, 0) as famcnt3
from ((select date, fam1 from table1
) union -- on purpose to remove duplicates
(select date, fam1 from table2
) union -- on purpose to remove duplicates
(select date, fam1 from table3
)
) df left join
table1 t1
on t1.date = df.date and t1.name = df.name left join
table2 t2
on t2.date = df.date and t2.name = df.name left join
table3 t3
on t3.date = df.date and t3.name = df.name;
如果您对 NULL
而不是 0
感到满意,那么根本不需要 COALESCE()
。