如何通过将部分行变为列来在 Clickhouse 中制作 pivot_wider (r) / pivot (pandas)?
How to make pivot_wider (r) / pivot (pandas) in clickhouse by mutating part of the rows into columns?
我有一个table喜欢:
event_id
date
event_name
key
value
1
'2021-01-01'
'session_start'
'session_id'
'12345'
1
'2021-01-01'
'session_start'
'network'
'organic'
1
'2021-01-01'
'session_start'
'screen_id'
'22'
1
'2021-01-01'
'session_start'
'any_var'
'True'
2
'2021-01-02'
'app_deleted'
'session_id'
'23456'
2
'2021-01-02'
'app_deleted'
'network'
'organic'
2
'2021-01-02'
'app_deleted'
'screen_id'
'33'
我想将它变成具有更多列的 table 并减少行数,以便 'key' 值成为列,而 'value' - 的值这些列,设置 NULL,其中值将为空。总共会有 ~ 100 列。
event_id
date
event_name
session_id
network
screen_id
any_var
1
'2021-01-01'
'session_start'
'12345'
'organic'
'22'
'True'
1
'2021-01-02'
'app_deleted'
'23456'
'organic'
'33'
NULL
谢谢!
P.S。我自己想不出数组的解决方案
P.P.S 不幸的是,密钥可能不同,而且每个月都不同
SELECT
event_id,
date,
event_name,
anyIf(toNullable(value), key = 'session_id') AS session_id,
anyIf(toNullable(value), key = 'network') AS network,
anyIf(toNullable(value), key = 'screen_id') AS screen_id,
anyIf(toNullable(value), key = 'any_var') AS any_var
FROM b
GROUP BY
event_id,
date,
event_name;
┌─event_id─┬───────date─┬─event_name────┬─session_id─┬─network─┬─screen_id─┬─any_var─┐
│ 1 │ 2021-01-01 │ session_start │ 12345 │ organic │ 22 │ True │
│ 2 │ 2021-01-02 │ app_deleted │ 23456 │ organic │ 33 │ ᴺᵁᴸᴸ │
└──────────┴────────────┴───────────────┴────────────┴─────────┴───────────┴─────────┘
create table b (event_id int, date date, event_name String, key String, value String) Engine=Memory;
insert into b values
(1 ,'2021-01-01','session_start','session_id','12345')
(1,'2021-01-01','session_start','network','organic')
(1,'2021-01-01','session_start','screen_id','22')
(1,'2021-01-01','session_start','any_var','True')
(2,'2021-01-02','app_deleted','session_id','23456')
(2,'2021-01-02','app_deleted','network','organic')
(2,'2021-01-02','app_deleted','screen_id','33')
SQL 不允许列数可变。每行有相同数量的列。
列数、它们的名称和类型在执行查询之前是已知的。
SELECT
event_id,
date,
event_name,
groupArray((key, value))
FROM b
GROUP BY
event_id,
date,
event_name
┌─event_id─┬───────date─┬─event_name────┬─groupArray(tuple(key, value))────────────────────────────────────────────────────────┐
│ 1 │ 2021-01-01 │ session_start │ [('session_id','12345'),('network','organic'),('screen_id','22'),('any_var','True')] │
│ 2 │ 2021-01-02 │ app_deleted │ [('session_id','23456'),('network','organic'),('screen_id','33')] │
└──────────┴────────────┴───────────────┴──────────────────────────────────────────────────────────────────────────────────────┘
我有一个table喜欢:
event_id | date | event_name | key | value |
---|---|---|---|---|
1 | '2021-01-01' | 'session_start' | 'session_id' | '12345' |
1 | '2021-01-01' | 'session_start' | 'network' | 'organic' |
1 | '2021-01-01' | 'session_start' | 'screen_id' | '22' |
1 | '2021-01-01' | 'session_start' | 'any_var' | 'True' |
2 | '2021-01-02' | 'app_deleted' | 'session_id' | '23456' |
2 | '2021-01-02' | 'app_deleted' | 'network' | 'organic' |
2 | '2021-01-02' | 'app_deleted' | 'screen_id' | '33' |
我想将它变成具有更多列的 table 并减少行数,以便 'key' 值成为列,而 'value' - 的值这些列,设置 NULL,其中值将为空。总共会有 ~ 100 列。
event_id | date | event_name | session_id | network | screen_id | any_var |
---|---|---|---|---|---|---|
1 | '2021-01-01' | 'session_start' | '12345' | 'organic' | '22' | 'True' |
1 | '2021-01-02' | 'app_deleted' | '23456' | 'organic' | '33' | NULL |
谢谢!
P.S。我自己想不出数组的解决方案
P.P.S 不幸的是,密钥可能不同,而且每个月都不同
SELECT
event_id,
date,
event_name,
anyIf(toNullable(value), key = 'session_id') AS session_id,
anyIf(toNullable(value), key = 'network') AS network,
anyIf(toNullable(value), key = 'screen_id') AS screen_id,
anyIf(toNullable(value), key = 'any_var') AS any_var
FROM b
GROUP BY
event_id,
date,
event_name;
┌─event_id─┬───────date─┬─event_name────┬─session_id─┬─network─┬─screen_id─┬─any_var─┐
│ 1 │ 2021-01-01 │ session_start │ 12345 │ organic │ 22 │ True │
│ 2 │ 2021-01-02 │ app_deleted │ 23456 │ organic │ 33 │ ᴺᵁᴸᴸ │
└──────────┴────────────┴───────────────┴────────────┴─────────┴───────────┴─────────┘
create table b (event_id int, date date, event_name String, key String, value String) Engine=Memory;
insert into b values
(1 ,'2021-01-01','session_start','session_id','12345')
(1,'2021-01-01','session_start','network','organic')
(1,'2021-01-01','session_start','screen_id','22')
(1,'2021-01-01','session_start','any_var','True')
(2,'2021-01-02','app_deleted','session_id','23456')
(2,'2021-01-02','app_deleted','network','organic')
(2,'2021-01-02','app_deleted','screen_id','33')
SQL 不允许列数可变。每行有相同数量的列。 列数、它们的名称和类型在执行查询之前是已知的。
SELECT
event_id,
date,
event_name,
groupArray((key, value))
FROM b
GROUP BY
event_id,
date,
event_name
┌─event_id─┬───────date─┬─event_name────┬─groupArray(tuple(key, value))────────────────────────────────────────────────────────┐
│ 1 │ 2021-01-01 │ session_start │ [('session_id','12345'),('network','organic'),('screen_id','22'),('any_var','True')] │
│ 2 │ 2021-01-02 │ app_deleted │ [('session_id','23456'),('network','organic'),('screen_id','33')] │
└──────────┴────────────┴───────────────┴──────────────────────────────────────────────────────────────────────────────────────┘