在 Snowflake 中按分区过滤

Question

我想过滤掉每个 created status 之前 start_time 的记录 start_time =14=].

例如基于 start_time，低于 id A 在 'created' 之前具有 'failed' 状态。所以需要过滤。而 id B 首先是 'created'，然后是另一个可接受的状态。

所以预期的结果只有这个，但我正在寻找适用于数千行的可扩展解决方案。

WITH t1 AS (
SELECT 'A' AS id, 'failed' AS status, '2021-05-18 18:30:00'::timestamp AS start_time UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-24 11:30:00'::timestamp AS start_time UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-24 12:00:00'::timestamp AS start_time UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-05-19 18:30:00'::timestamp AS start_time UNION ALL
SELECT 'B' AS id, 'successful' AS status, '2021-05-20 11:30:00'::timestamp AS start_time
    )
SELECT *
FROM t1

Answer 1

有多种方法可以实现这一点，但这里有一种方法使用 first_value。

with t1 (id, status, start_time) as 
(select 'a', 'failed', '2021-05-18 18:30:00'::timestamp union all
 select 'a', 'created', '2021-05-24 11:30:00'::timestamp union all
 select 'a', 'created', '2021-05-24 12:00:00'::timestamp union all
 select 'b', 'created', '2021-05-19 18:30:00'::timestamp union all
 select 'b', 'successful', '2021-05-20 11:30:00'::timestamp)

select *
from t1
qualify first_value(status) over (partition by id order by start_time asc) = 'created'

您所做的只是确保任何给定 ID 的第一个状态是“已创建”。将 qualify 子句视为 window functions 的 having 子句。如果您觉得可读性更好，也可以将其分解为子查询。

注意：上面的解决方案也将保留只有“created”状态的记录。如果要保证每个id至少有两个不同状态，修改为

select *
from t1
qualify first_value(status) over (partition by id order by start_time asc) = 'created'
        and 
        count(distinct status) over (partition by id) > 1;

在 Snowflake 中按分区过滤

Filter by partition in Snowflake

sql

where-clause

window-functions

snowflake-cloud-data-platform