跨多行合并空值

Coalesce null across multiple rows

我有一个table这种形式

 id_customer__ |        status         |     time_stmpd_at     | idx
---------------+-----------------------+-----------------------+-----
        112220 | enabled____________at | 2017-12-13 16:12:42.0 |   1
        112220 | sale_locked_at__      | 2017-12-13 14:52:43.0 |   2
        112220 | qual_sale_at          | 2017-12-06 12:22:50.0 |   3
        112220 | quality_control___at  | 2017-11-28 18:22:02.0 |   4
        112220 | returned__at          | 2017-10-12 23:02:41.0 |   5

我想要 status 其中 idx = 2 和 time_stmpd_at 其中 idx = 1。并且能够对所有客户 ID 执行此操作。

我试过像这样将条件放入 select 语句中:

select
  id_customer__,
  if(idx=2, status, NULL) as previous_status,
  if(idx=1, time_stmpd_at, NULL) as time_stmpd_at
from htable

但这给我留下了

 id_customer__ | previous_status  |      time_stmpd_at
---------------+------------------+-----------------------
        119650 | NULL             | 2017-12-13 16:12:42.0
        119650 | sale_locked_at__ | NULL
        119650 | NULL             | NULL
        119650 | NULL             | NULL
        119650 | NULL             | NULL

接下来我必须将该字段合并为一行。但是我觉得一定有更好的办法。对整体方法有什么建议吗?

您可以使用条件聚合来做到这一点。

select
  id_customer__,
  max(case when idx=2 then status end) as previous_status,
  max(case when idx=1 then time_stmpd_at end) as time_stmpd_at
from htable
group by id_customer__

您可以使用 MAX 并将 table 限制为仅您想要的索引(您不必这样做,但为什么要计算不相关的行):

SELECT id_customer__, 
    MAX(CASE WHEN idx=1 THEN time_stmpd_at ELSE NULL END) time_stmpd_at,
    MAX(CASE WHEN idx=2 THEN status ELSE NULL END) status
FROM htable 
WHERE idx IN (1,2)
GROUP BY id_customer__

或者您可以单独提取这些索引并在 id_customer__

上加入它们
SELECT h1.id_customer__, h1.time_stmpd_at , h2.status 
FROM
(SELECT * FROM htable WHERE idx=1) h1 INNER JOIN
(SELECT * FROM htable WHERE idx=2) h2 ON h1.id_customer__ = h2.id_customer__

(基于@VamsiPrabhala 的回答,但改为使用 arbitrary 聚合)

我建议使用 arbitrary 聚合(而不是 max),因为它能更好地表达意思:

select id_customer,
  arbitrary(status) filter(where idx=2) as previous_status,
  arbitrary(time_stmpd_at) filter(where idx=1) as time_stmpd_at
from htable
group by id_customer

使用arbitrary有两个原因:

  1. arbitrary 表示您没有进行任何 max 聚合(这很好,以防将来有人阅读此查询)
  2. 没有任何聚合会拒绝多个值。如果有的话,我会建议在 arbitrary
  3. 上使用它