如何通过将部分行变为列来在 Clickhouse 中制作 pivot_wider (r) / pivot (pandas)?

How to make pivot_wider (r) / pivot (pandas) in clickhouse by mutating part of the rows into columns?

我有一个table喜欢:

event_id date event_name key value
1 '2021-01-01' 'session_start' 'session_id' '12345'
1 '2021-01-01' 'session_start' 'network' 'organic'
1 '2021-01-01' 'session_start' 'screen_id' '22'
1 '2021-01-01' 'session_start' 'any_var' 'True'
2 '2021-01-02' 'app_deleted' 'session_id' '23456'
2 '2021-01-02' 'app_deleted' 'network' 'organic'
2 '2021-01-02' 'app_deleted' 'screen_id' '33'

我想将它变成具有更多列的 table 并减少行数,以便 'key' 值成为列,而 'value' - 的值这些列,设置 NULL,其中值将为空。总共会有 ~ 100 列。

event_id date event_name session_id network screen_id any_var
1 '2021-01-01' 'session_start' '12345' 'organic' '22' 'True'
1 '2021-01-02' 'app_deleted' '23456' 'organic' '33' NULL

谢谢!

P.S。我自己想不出数组的解决方案

P.P.S 不幸的是,密钥可能不同,而且每个月都不同

SELECT
    event_id,
    date,
    event_name,
    anyIf(toNullable(value), key = 'session_id') AS session_id,
    anyIf(toNullable(value), key = 'network') AS network,
    anyIf(toNullable(value), key = 'screen_id') AS screen_id,
    anyIf(toNullable(value), key = 'any_var') AS any_var
FROM b
GROUP BY
    event_id,
    date,
    event_name;

┌─event_id─┬───────date─┬─event_name────┬─session_id─┬─network─┬─screen_id─┬─any_var─┐
│        1 │ 2021-01-01 │ session_start │ 12345      │ organic │ 22        │ True    │
│        2 │ 2021-01-02 │ app_deleted   │ 23456      │ organic │ 33        │ ᴺᵁᴸᴸ    │
└──────────┴────────────┴───────────────┴────────────┴─────────┴───────────┴─────────┘


create table b (event_id int, date date, event_name String, key String, value String) Engine=Memory;

insert into b values
(1 ,'2021-01-01','session_start','session_id','12345')
(1,'2021-01-01','session_start','network','organic')
(1,'2021-01-01','session_start','screen_id','22')
(1,'2021-01-01','session_start','any_var','True')
(2,'2021-01-02','app_deleted','session_id','23456')
(2,'2021-01-02','app_deleted','network','organic')
(2,'2021-01-02','app_deleted','screen_id','33')


SQL 不允许列数可变。每行有相同数量的列。 列数、它们的名称和类型在执行查询之前是已知的。

SELECT
    event_id,
    date,
    event_name,
    groupArray((key, value))
FROM b
GROUP BY
    event_id,
    date,
    event_name
┌─event_id─┬───────date─┬─event_name────┬─groupArray(tuple(key, value))────────────────────────────────────────────────────────┐
│        1 │ 2021-01-01 │ session_start │ [('session_id','12345'),('network','organic'),('screen_id','22'),('any_var','True')] │
│        2 │ 2021-01-02 │ app_deleted   │ [('session_id','23456'),('network','organic'),('screen_id','33')]                    │
└──────────┴────────────┴───────────────┴──────────────────────────────────────────────────────────────────────────────────────┘