在 Snowflake 中按分区过滤
Filter by partition in Snowflake
我想过滤掉每个 created status
之前 start_time
的记录 start_time
=14=].
例如基于 start_time
,低于 id A 在 'created' 之前具有 'failed' 状态。所以需要过滤。
而 id B 首先是 'created',然后是另一个可接受的状态。
所以预期的结果只有这个,但我正在寻找适用于数千行的可扩展解决方案。
WITH t1 AS (
SELECT 'A' AS id, 'failed' AS status, '2021-05-18 18:30:00'::timestamp AS start_time UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-24 11:30:00'::timestamp AS start_time UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-24 12:00:00'::timestamp AS start_time UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-05-19 18:30:00'::timestamp AS start_time UNION ALL
SELECT 'B' AS id, 'successful' AS status, '2021-05-20 11:30:00'::timestamp AS start_time
)
SELECT *
FROM t1
有多种方法可以实现这一点,但这里有一种方法使用 first_value
。
with t1 (id, status, start_time) as
(select 'a', 'failed', '2021-05-18 18:30:00'::timestamp union all
select 'a', 'created', '2021-05-24 11:30:00'::timestamp union all
select 'a', 'created', '2021-05-24 12:00:00'::timestamp union all
select 'b', 'created', '2021-05-19 18:30:00'::timestamp union all
select 'b', 'successful', '2021-05-20 11:30:00'::timestamp)
select *
from t1
qualify first_value(status) over (partition by id order by start_time asc) = 'created'
您所做的只是确保任何给定 ID 的第一个状态是“已创建”。将 qualify
子句视为 window functions
的 having
子句。如果您觉得可读性更好,也可以将其分解为子查询。
注意:上面的解决方案也将保留只有“created”状态的记录。如果要保证每个id至少有两个不同状态,修改为
select *
from t1
qualify first_value(status) over (partition by id order by start_time asc) = 'created'
and
count(distinct status) over (partition by id) > 1;
我想过滤掉每个 created status
之前 start_time
的记录 start_time
=14=].
例如基于 start_time
,低于 id A 在 'created' 之前具有 'failed' 状态。所以需要过滤。
而 id B 首先是 'created',然后是另一个可接受的状态。
所以预期的结果只有这个,但我正在寻找适用于数千行的可扩展解决方案。
WITH t1 AS (
SELECT 'A' AS id, 'failed' AS status, '2021-05-18 18:30:00'::timestamp AS start_time UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-24 11:30:00'::timestamp AS start_time UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-24 12:00:00'::timestamp AS start_time UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-05-19 18:30:00'::timestamp AS start_time UNION ALL
SELECT 'B' AS id, 'successful' AS status, '2021-05-20 11:30:00'::timestamp AS start_time
)
SELECT *
FROM t1
有多种方法可以实现这一点,但这里有一种方法使用 first_value
。
with t1 (id, status, start_time) as
(select 'a', 'failed', '2021-05-18 18:30:00'::timestamp union all
select 'a', 'created', '2021-05-24 11:30:00'::timestamp union all
select 'a', 'created', '2021-05-24 12:00:00'::timestamp union all
select 'b', 'created', '2021-05-19 18:30:00'::timestamp union all
select 'b', 'successful', '2021-05-20 11:30:00'::timestamp)
select *
from t1
qualify first_value(status) over (partition by id order by start_time asc) = 'created'
您所做的只是确保任何给定 ID 的第一个状态是“已创建”。将 qualify
子句视为 window functions
的 having
子句。如果您觉得可读性更好,也可以将其分解为子查询。
注意:上面的解决方案也将保留只有“created”状态的记录。如果要保证每个id至少有两个不同状态,修改为
select *
from t1
qualify first_value(status) over (partition by id order by start_time asc) = 'created'
and
count(distinct status) over (partition by id) > 1;