将未来日期加入 table,其中只有当天的日期
Join future dates to table which only has dates until current day
我有这两个 table:
table1: name (string), actual (double), yyyy_mm_dd (date)
table2: name (string), expected(double), yyyy_mm_dd (string)
table1
包含从 2018-01-01 到今天的数据,table2
包含 2020 年的预测数据。我的问题是 table1
没有有超过当前日期的任何日期值,所以我在加入时得到重复数据,如下所示:
SELECT
kpi.yyyy_mm_dd,
kpi.name,
kpi.actual as actual,
pre.predicted as predicted
FROM
schema1.table1 kpi
LEFT JOIN
schema1.table2 pre
ON name = kpi.name --AND pre.yyyy_mm_dd = kpi.yyyy_mm_dd
WHERE
kpi.yyyy_mm_dd >= '2019-12-09'
输出:
+----------+------------+----------+-------------+
|yyyy_mm_dd| name |actual |predicted |
+----------+------------+----------+-------------+
|2019-12-10| Company | 100000 | 925,180 |
|2019-12-10| Company | 100000 | 1,145,723 |
|2019-12-10| Company | 100000 | 456,359 |
--------------------------------------------------
如果我在连接子句中取消注释 AND
条件,我将不会获得预测值,因为我的第一个 table 没有 2020 年的数据。如何在不复制实际值的情况下将这些 table 连接在一起? actual
对于尚未发生的天数应该为空。
Hive 支持 full join
:
SELECT COALESCE(kpi.yyyy_mm_dd, pre.yyyy_mm_dd) as yyyy_mm_dd,
COALESCE(kpi.name, pre.name) as name,
kpi.actual as actual,
pre.predicted as predicted
FROM (SELECT kpi.*
FROM schema1.table1 kpi
WHERE kpi.yyyy_mm_dd >= '2019-12-09'
) kpi FULL JOIN
schema1.table2 pre
ON kpi.name = pre.name AND
kpi.yyyy_mm_dd = pre.yyyy_mm_dd
我认为您需要 UNION ALL 而不是 JOIN:
SELECT
yyyy_mm_dd,
name,
actual as actual,
NULL as predicted
FROM schema1.table1
WHERE yyyy_mm_dd >= '2019-12-09'
UNION ALL
SELECT
yyyy_mm_dd,
name,
NULL as actual,
predicted as predicted
FROM schema1.table2
尝试使用
group by
您查询中的子句,下面可能会解决您的问题
SELECT
kpi.yyyy_mm_dd,
kpi.name,
kpi.actual as actual,
pre.predicted as predicted
FROM
schema1.table1 kpi
LEFT JOIN
schema1.table2 pre
ON name = kpi.name
group by kpi.yyyy_mm_dd,kpi.name,kpi.actual
我有这两个 table:
table1: name (string), actual (double), yyyy_mm_dd (date)
table2: name (string), expected(double), yyyy_mm_dd (string)
table1
包含从 2018-01-01 到今天的数据,table2
包含 2020 年的预测数据。我的问题是 table1
没有有超过当前日期的任何日期值,所以我在加入时得到重复数据,如下所示:
SELECT
kpi.yyyy_mm_dd,
kpi.name,
kpi.actual as actual,
pre.predicted as predicted
FROM
schema1.table1 kpi
LEFT JOIN
schema1.table2 pre
ON name = kpi.name --AND pre.yyyy_mm_dd = kpi.yyyy_mm_dd
WHERE
kpi.yyyy_mm_dd >= '2019-12-09'
输出:
+----------+------------+----------+-------------+
|yyyy_mm_dd| name |actual |predicted |
+----------+------------+----------+-------------+
|2019-12-10| Company | 100000 | 925,180 |
|2019-12-10| Company | 100000 | 1,145,723 |
|2019-12-10| Company | 100000 | 456,359 |
--------------------------------------------------
如果我在连接子句中取消注释 AND
条件,我将不会获得预测值,因为我的第一个 table 没有 2020 年的数据。如何在不复制实际值的情况下将这些 table 连接在一起? actual
对于尚未发生的天数应该为空。
Hive 支持 full join
:
SELECT COALESCE(kpi.yyyy_mm_dd, pre.yyyy_mm_dd) as yyyy_mm_dd,
COALESCE(kpi.name, pre.name) as name,
kpi.actual as actual,
pre.predicted as predicted
FROM (SELECT kpi.*
FROM schema1.table1 kpi
WHERE kpi.yyyy_mm_dd >= '2019-12-09'
) kpi FULL JOIN
schema1.table2 pre
ON kpi.name = pre.name AND
kpi.yyyy_mm_dd = pre.yyyy_mm_dd
我认为您需要 UNION ALL 而不是 JOIN:
SELECT
yyyy_mm_dd,
name,
actual as actual,
NULL as predicted
FROM schema1.table1
WHERE yyyy_mm_dd >= '2019-12-09'
UNION ALL
SELECT
yyyy_mm_dd,
name,
NULL as actual,
predicted as predicted
FROM schema1.table2
尝试使用
group by
您查询中的子句,下面可能会解决您的问题
SELECT
kpi.yyyy_mm_dd,
kpi.name,
kpi.actual as actual,
pre.predicted as predicted
FROM
schema1.table1 kpi
LEFT JOIN
schema1.table2 pre
ON name = kpi.name
group by kpi.yyyy_mm_dd,kpi.name,kpi.actual