在 Postgresql 上配对顺序事件
Pairing Sequential Events on Postgresql
我们正在 table 上记录用户在我们的 iPad 应用程序上进行的主要操作流程。每个流都有一个开始(标记为已开始)和一个标记为已取消或已完成的结束,并且不应有任何重叠事件。
用户的一组已开始、已取消或已完成的流程如下所示:
user_id timestamp event_text event_num
info@cafe-test.de 2016-10-30 00:08:00.966+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:08:15.58+00 Flow Cancelled 2
info@cafe-test.de 2016-10-30 00:08:15.581+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:34:44.134+00 Flow Finished 1
info@cafe-test.de 2016-10-30 00:42:26.102+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:42:49.276+00 Flow Cancelled 2
info@cafe-test.de 2016-10-30 00:42:49.277+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:59:47.337+00 Flow Cancelled 2
info@cafe-test.de 2016-10-30 00:59:47.337+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:59:47.928+00 Flow Cancelled 2
我们要计算已取消和已完成的流程平均持续多长时间。为此,我们需要将事件 Started 与 Canceled 或 Finished 配对。以下代码可以做到这一点,但无法解决我们遇到的以下数据质量问题:
当客户想要在结束正在进行的流程 (Flow1) 之前开始新流程(我们称之为流程 2)时,我们会在拍摄新流程的开始事件时拍摄取消事件。所以Flow1 Cancelled=Flow2 Started
。但是,当我们使用 window 函数进行排序时,实际上属于不同流的有序事件之间的 lead/lag 会匹配。
通过使用此代码:
WITH track_scf AS (SELECT user_id, timestamp, event_text, CASE WHEN event_text LIKE '%Started%' THEN 0 when event_text like '%Cancelled%' then 2 ELSE 1 END AS event_num FROM tracks ORDER BY 2, 4 desc ) SELECT user_id, CASE WHEN event_num=0 then timestamp end as start,CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(timestamp, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as end, CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as action FROM track_scf
我们得到这个结果:
user_id start end action
info@cafe-test.de 2016-10-30 00:08:00.966+00 2016-10-30 00:08:15.58+00 2
info@cafe-test.de 2016-10-30 00:08:15.581+00 2016-10-30 00:34:44.134+00 1
info@cafe-test.de 2016-10-30 00:42:26.102+00 2016-10-30 00:42:49.276+00 2
info@cafe-test.de 2016-10-30 00:42:49.277+00 NULL NULL
info@cafe-test.de 2016-10-30 00:59:47.337+00 2016-10-30 00:59:47.337+00 2
info@cafe-test.de NULL 2016-10-30 00:59:47.928+00 2
但我们应该得到这个:
user_id start end action
info@cafe-test.de 2016-10-30 00:08:00.966+00 2016-10-30 00:08:15.58+00 2
info@cafe-test.de 2016-10-30 00:08:15.581+00 2016-10-30 00:34:44.134+00 1
info@cafe-test.de 2016-10-30 00:42:26.102+00 2016-10-30 00:42:49.276+00 2
info@cafe-test.de 2016-10-30 00:42:49.277+00 2016-10-30 00:59:47.337+00 2
info@cafe-test.de 2016-10-30 00:59:47.337+00 2016-10-30 00:59:47.928+00 2
我需要如何更改代码才能正确配对?
select user_id
,"start"
,"end"
,"action"
from (select user_id
,timestamp as "start"
,lead (event_num) over w as "action"
,lead ("timestamp") over w as "end"
,event_num
from tracks t
window w as (partition by user_id order by "timestamp",event_num desc)
) t
where t.event_num = 0
;
我们正在 table 上记录用户在我们的 iPad 应用程序上进行的主要操作流程。每个流都有一个开始(标记为已开始)和一个标记为已取消或已完成的结束,并且不应有任何重叠事件。
用户的一组已开始、已取消或已完成的流程如下所示:
user_id timestamp event_text event_num
info@cafe-test.de 2016-10-30 00:08:00.966+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:08:15.58+00 Flow Cancelled 2
info@cafe-test.de 2016-10-30 00:08:15.581+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:34:44.134+00 Flow Finished 1
info@cafe-test.de 2016-10-30 00:42:26.102+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:42:49.276+00 Flow Cancelled 2
info@cafe-test.de 2016-10-30 00:42:49.277+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:59:47.337+00 Flow Cancelled 2
info@cafe-test.de 2016-10-30 00:59:47.337+00 Flow Started 0
info@cafe-test.de 2016-10-30 00:59:47.928+00 Flow Cancelled 2
我们要计算已取消和已完成的流程平均持续多长时间。为此,我们需要将事件 Started 与 Canceled 或 Finished 配对。以下代码可以做到这一点,但无法解决我们遇到的以下数据质量问题:
当客户想要在结束正在进行的流程 (Flow1) 之前开始新流程(我们称之为流程 2)时,我们会在拍摄新流程的开始事件时拍摄取消事件。所以
Flow1 Cancelled=Flow2 Started
。但是,当我们使用 window 函数进行排序时,实际上属于不同流的有序事件之间的 lead/lag 会匹配。 通过使用此代码:WITH track_scf AS (SELECT user_id, timestamp, event_text, CASE WHEN event_text LIKE '%Started%' THEN 0 when event_text like '%Cancelled%' then 2 ELSE 1 END AS event_num FROM tracks ORDER BY 2, 4 desc ) SELECT user_id, CASE WHEN event_num=0 then timestamp end as start,CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(timestamp, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as end, CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as action FROM track_scf
我们得到这个结果:
user_id start end action
info@cafe-test.de 2016-10-30 00:08:00.966+00 2016-10-30 00:08:15.58+00 2
info@cafe-test.de 2016-10-30 00:08:15.581+00 2016-10-30 00:34:44.134+00 1
info@cafe-test.de 2016-10-30 00:42:26.102+00 2016-10-30 00:42:49.276+00 2
info@cafe-test.de 2016-10-30 00:42:49.277+00 NULL NULL
info@cafe-test.de 2016-10-30 00:59:47.337+00 2016-10-30 00:59:47.337+00 2
info@cafe-test.de NULL 2016-10-30 00:59:47.928+00 2
但我们应该得到这个:
user_id start end action
info@cafe-test.de 2016-10-30 00:08:00.966+00 2016-10-30 00:08:15.58+00 2
info@cafe-test.de 2016-10-30 00:08:15.581+00 2016-10-30 00:34:44.134+00 1
info@cafe-test.de 2016-10-30 00:42:26.102+00 2016-10-30 00:42:49.276+00 2
info@cafe-test.de 2016-10-30 00:42:49.277+00 2016-10-30 00:59:47.337+00 2
info@cafe-test.de 2016-10-30 00:59:47.337+00 2016-10-30 00:59:47.928+00 2
我需要如何更改代码才能正确配对?
select user_id
,"start"
,"end"
,"action"
from (select user_id
,timestamp as "start"
,lead (event_num) over w as "action"
,lead ("timestamp") over w as "end"
,event_num
from tracks t
window w as (partition by user_id order by "timestamp",event_num desc)
) t
where t.event_num = 0
;