将未来日期加入 table,其中只有当天的日期

Join future dates to table which only has dates until current day

我有这两个 table:

table1: name (string), actual (double), yyyy_mm_dd (date)
table2: name (string), expected(double), yyyy_mm_dd (string)

table1 包含从 2018-01-01 到今天的数据,table2 包含 2020 年的预测数据。我的问题是 table1 没有有超过当前日期的任何日期值,所以我在加入时得到重复数据,如下所示:

SELECT 
    kpi.yyyy_mm_dd,
    kpi.name,
    kpi.actual as actual,
    pre.predicted as predicted 
FROM
    schema1.table1 kpi
LEFT JOIN 
    schema1.table2 pre 
    ON name = kpi.name --AND pre.yyyy_mm_dd = kpi.yyyy_mm_dd
WHERE
     kpi.yyyy_mm_dd >= '2019-12-09'

输出:

+----------+------------+----------+-------------+
|yyyy_mm_dd|  name      |actual    |predicted    |
+----------+------------+----------+-------------+
|2019-12-10|  Company   | 100000   | 925,180     |
|2019-12-10|  Company   | 100000   | 1,145,723   |
|2019-12-10|  Company   | 100000   | 456,359     |
--------------------------------------------------

如果我在连接子句中取消注释 AND 条件,我将不会获得预测值,因为我的第一个 table 没有 2020 年的数据。如何在不复制实际值的情况下将这些 table 连接在一起? actual 对于尚未发生的天数应该为空。

Hive 支持 full join:

SELECT COALESCE(kpi.yyyy_mm_dd, pre.yyyy_mm_dd) as yyyy_mm_dd,
       COALESCE(kpi.name, pre.name) as name,
       kpi.actual as actual,
       pre.predicted as predicted 
FROM (SELECT kpi.*
      FROM schema1.table1 kpi 
      WHERE kpi.yyyy_mm_dd >= '2019-12-09'
     ) kpi FULL JOIN
     schema1.table2 pre 
     ON kpi.name = pre.name AND
        kpi.yyyy_mm_dd = pre.yyyy_mm_dd

我认为您需要 UNION ALL 而不是 JOIN:

SELECT 
    yyyy_mm_dd,
    name,
    actual as actual,
    NULL as predicted 
FROM schema1.table1
WHERE yyyy_mm_dd >= '2019-12-09'
UNION ALL
SELECT 
    yyyy_mm_dd,
    name,
    NULL as actual,
    predicted as predicted 
FROM schema1.table2

尝试使用

group by

您查询中的子句,下面可能会解决您的问题

SELECT 
    kpi.yyyy_mm_dd,
    kpi.name,
    kpi.actual as actual,
    pre.predicted as predicted 
FROM
    schema1.table1 kpi
LEFT JOIN 
    schema1.table2 pre 
    ON name = kpi.name
group by kpi.yyyy_mm_dd,kpi.name,kpi.actual