加入具有相同列名的数据

Question

抱歉标题乱七八糟，我不确定用什么措辞最好。我每天有两个 tables，第一个看起来像这样：

| yyyy_mm_dd | x_id | feature     | impl_status   |
|------------|------|-------------|---------------|
| 2020-08-18 | 1    | Basic       | first_contact |
| 2020-08-18 | 1    | Last Minute | first_contact |
| 2020-08-18 | 1    | Geo         | first_contact |
| 2020-08-18 | 2    | Basic       | implemented   |
| 2020-08-18 | 2    | Last Minute | first_contact |
| 2020-08-18 | 2    | Geo         | no_contact    |
| 2020-08-18 | 3    | Basic       | no_contact    |
| 2020-08-18 | 3    | Last Minute | no_contact    |
| 2020-08-18 | 3    | Geo         | implemented   |

虽然第二个看起来像这样：

| yyyy_mm_dd | x_id | payment |
|------------|------|---------|
| 2020-08-18 | 1    | 0       |
| 2020-08-18 | 2    | 0       |
| 2020-08-18 | 3    | 1       |
| 2020-08-19 | 1    | 0       |
| 2020-08-19 | 2    | 0       |
| 2020-08-19 | 3    | 1       |

我想构建一个查询，其中 payment 成为第一个 table 中的 feature。不会有 first_contact 状态，因为 payment 是布尔值 (1/0)。这是我试过的：

select
    yyyy_mm_dd,
    t1.x_id
    t1.impl_status
from
    schema.table1 t1
left join(
    select
        yyyy_mm_dd,
        x_id,
        'payment' as feature,
        if(payment=1, 'implemented', 'no_contact') as impl_status
    from
         schema.table2
 ) t2 on t2.yyyy_mm_dd = t1.yyyy_mm_dd and t2.x_id = t1.x_id

但是这样做，由于歧义，我将需要 select t1.impl_status 或 t2.impl_status。这两列没有合并。

考虑到这一点，预期的输出将如下所示：

| yyyy_mm_dd | x_id | feature     | impl_status   |
|------------|------|-------------|---------------|
| 2020-08-18 | 1    | Basic       | first_contact |
| 2020-08-18 | 1    | Last Minute | first_contact |
| 2020-08-18 | 1    | Geo         | first_contact |
| 2020-08-18 | 1    | Payment     | no_contact    |
| 2020-08-18 | 2    | Basic       | implemented   |
| 2020-08-18 | 2    | Last Minute | first_contact |
| 2020-08-18 | 2    | Geo         | no_contact    |
| 2020-08-18 | 2    | Payment     | no_contact    |
| 2020-08-18 | 3    | Basic       | no_contact    |
| 2020-08-18 | 3    | Last Minute | no_contact    |
| 2020-08-18 | 3    | Geo         | implemented   |
| 2020-08-18 | 3    | Payment     | implemented   |
| 2020-08-19 ...
 ...

Answer 1

您可以使用 union all:

select yyyy_mm_dd, x_id, feature, impl_status from table1 t1
union all
select yyyy_mm_dd, x_id, 'Payment', case when payment = 0 then 'no_contact' else 'implemented' end from table2

加入具有相同列名的数据

Joining data which has the same column name

sql

union

hive

left-join

hiveql