使用 group_by 将第一个创建的记录的标识符添加到 select 语句

Add identifier of first created record to select statement with group_by

我有以下付款table

┌─name───────────────────────────┬─type────────────────────────────┐
│ payment_id                     │ UInt64                          │
│ factory                        │ String                          │
│ user_id                        │ UInt64                          │
│ amount_cents                   │ Int64                           │
│ action                         │ String                          │
│ success                        │ UInt8                           │
│ country                        │ FixedString(2)                  │
│ created_at                     │ DateTime                        │
│ finished_at                    │ Nullable(DateTime)              │
└────────────────────────────────┴─────────────────────────────────┘

有样本数据

┌─factory───┬─────────finished_at─┬─payment_id─┬─country─┬─action──┬─amount_cents─┬─user_id───┬
│ 0_factory │ 2021-01-18 00:00:01 │          1 │ BY      │ payment │            1 │         1 │ 
│ 0_factory │ 2021-01-18 00:00:02 │          2 │ BY      │ payment │            1 │         1 │ 
│ 1_factory │ 2021-01-18 00:00:02 │          2 │ PL      │ win     │            4 │         1 │ 
│ 1_factory │ 2021-01-18 00:00:03 │          3 │ PL      │ win     │            7 │         1 │ 
│ 2_factory │ 2021-01-18 00:00:01 │          4 │ PL      │ win     │            7 │         1 │ 
│ 2_factory │ 2021-01-18 00:00:02 │          1 │ PL      │ payment │            7 │         1 │ 
│ 2_factory │ 2021-01-18 00:00:03 │          2 │ PL      │ win     │            7 │         1 │ 
│ 2_factory │ 2021-01-18 00:00:04 │          3 │ GR      │ win     │            2 │         1 │ 
└───────────┴─────────────────────┴────────────┴─────────┴─────────┴─────────┴────────────────┘

这是我现在使用的示例

SELECT
       factory,
        user_id,
        payment_id,
        action,
        created_at
    FROM payments_all
    WHERE (payments_all.action = 'payment') AND (payments_all.factory IN ('0_factory', '1_factory', '2_factory')) AND isNotNull(payments_all.created_at)
    GROUP BY
        factory,
        user_id,
        payment_id,
        action
    HAVING (min(created_at) >= toDate('2019-01-01 00:00:00')) AND (min(created_at) < toDate('2021-10-01 00:00:00'))
    ORDER BY user_id

┌─factory───┬─user_id─┬─payment_id─┬─action──┬──────────created_at─┐
│ 1_factory │       1 │          1 │ payment │ 2021-02-04 09:00:00 │
│ 0_factory │       1 │          1 │ payment │ 2021-01-17 00:00:01 │
│ 0_factory │       1 │          2 │ payment │ 2021-01-17 00:00:06 │
└───────────┴─────────┴────────────┴─────────┴─────────────────────┘

我需要添加新列first_payment

first_payment 如果操作是付款,则取值 1 && 这是用户的第一次付款。否则取值 0.

first_payment 应检查所有期间 所以预期结果是:

┌─factory───┬─────────finished_at─┬─payment_id─┬─country─┬─action──┬─amount_cents─┬─user_id───┬first_payment─┐
│ 0_factory │ 2021-01-18 00:00:01 │          1 │ BY      │ deposit │            1 │         1 │           1  │
│ 0_factory │ 2021-01-18 00:00:02 │          2 │ BY      │ deposit │            1 │         1 │           0  │ 
│ 1_factory │ 2021-01-18 00:00:02 │          2 │ PL      │ win     │            4 │         1 │           0  │
│ 1_factory │ 2021-01-18 00:00:03 │          3 │ PL      │ win     │            7 │         1 │           0  │
│ 2_factory │ 2021-01-18 00:00:01 │          4 │ PL      │ win     │            7 │         1 │           0  │
│ 2_factory │ 2021-01-18 00:00:02 │          1 │ PL      │ deposit │            7 │         1 │           1  │
│ 2_factory │ 2021-01-18 00:00:03 │          2 │ PL      │ win     │            7 │         1 │           0  │
│ 2_factory │ 2021-01-18 00:00:04 │          3 │ GR      │ win     │            2 │         1 │           0  │
└───────────┴─────────────────────┴────────────┴─────────┴─────────┴─────────┴────────────────┘

如我所见,对于首次付款,payment_id 始终为 1。因此,我认为您可以使用 CASE WHEN payment_id=1 Then 1 ELSE 0 END AS first_payment。请检查下面的查询=>

WITH CTE AS
(SELECT
       factory,
        user_id,
        payment_id,
        action,
        created_at
    FROM payments_all
    WHERE (payments_all.action = 'payment') AND (payments_all.factory IN ('0_factory', '1_factory', '2_factory')) AND isNotNull(payments_all.created_at)
    GROUP BY
        factory,
        user_id,
        payment_id,
        action
    HAVING (min(created_at) >= toDate('2019-01-01 00:00:00')) AND (min(created_at) < toDate('2021-10-01 00:00:00'))
) T1  

SELECT *,CASE WHEN payment_id=1 Then 1
         ELSE 0 END AS first_payment 
FROM CTE  
ORDER BY T1.user_id

注意: 查询是在 SQL 服务器中编写的。请检查并告诉我。

我找不到太多关于 ClickHouse 的信息,但它似乎不支持 Windowed Functions。

您的示例输出似乎也与示例 table 完全相同,再加上一列,所以我不确定您 GROUP BY 的目的是什么。

所以,我会在子查询上使用 LEFT JOIN

SELECT
  payments_all.*,
  CASE WHEN user_summary.user_id IS NOT NULL THEN 1 ELSE 0 END AS first_payment
FROM
  payments_all
LEFT JOIN
(
  SELECT
    user_id,
    factory,
    MIN(created_at)  AS first_created_at
  FROM
    payments_all
  WHERE
    action = 'payment'
  GROUP BY
    user_id,
    factory
)
  AS user_summary
    ON  payments_all.user_id    = user_summary.user_id
    ON  payments_all.factory    = user_summary.factory
    AND payments_all.created_at = user_summary.first_created_at
WHERE
     (payments_all.factory    IN ('0_factory', '1_factory', '2_factory'))
 AND (payments_all.created_at >= toDate('2019-01-01 00:00:00'))
 AND (payments_all.created_at <  toDate('2021-10-01 00:00:00'))