SELECT 前一组值基于 Google Big Query 中的条件
SELECT previous group value based on conditions in Google Big Query
目标是为所有适用行填充上一组的值并包括条件。条件在 STATUS
列中指定。需要针对条件 STATUS = A
.
调整查询
数据看起来像这样:
DATE ID VALUE GROUP_ID STATUS
2021-06-01 1 New York 1 A
2021-06-02 1 New York 1 A
2021-06-03 1 New York 1 B
2021-06-04 1 New York 1 A
2021-06-05 1 Boston 2 A
2021-06-06 1 Boston 2 A
2021-06-07 1 San Francisco 3 A
2021-06-08 1 San Francisco 3 A
2021-06-09 1 New York 4 A
预期结果:
数据如下所示:
DATE ID VALUE GROUP_ID STATUS PREVIOUS_VALUE
2021-06-01 1 New York 1 A NA
2021-06-02 1 New York 1 A NA
2021-06-03 1 New York 1 B NA
2021-06-04 1 New York 1 A NA
2021-06-05 1 Boston 2 A New York
2021-06-06 1 Boston 2 A New York
2021-06-07 1 San Francisco 3 A Boston
2021-06-08 1 San Francisco 3 A Boston
2021-06-09 1 New York 4 A San Francisco
到目前为止尝试
select *, last_vale(VALUE IGNORE NULLS) OVER (partition by ID, GROUP_ID order by DATE) from (
table)A
select *, lag(VALUE) OVER (partition by ID, GROUP_ID order by DATE) from (
table)A
我有一个备份计划来创建一个 table,它将根据条件保存唯一值,然后可以 运行 UPDATE
基于最接近的较小 GROUP_ID
的语句但宁愿有一个更可持续的解决方案。
谢谢。
第一次修改:排除标题为 'Boston' 的目标
我尝试使用 case when value not in ('Boston') then ...
DATE ID VALUE GROUP_ID STATUS PREVIOUS_VALUE
2021-06-01 1 New York 1 A NA
2021-06-02 1 New York 1 A NA
2021-06-03 1 New York 1 B NA
2021-06-04 1 New York 1 A NA
2021-06-05 1 Boston 2 A New York
2021-06-06 1 Boston 2 A New York
2021-06-07 1 San Francisco 3 A San Francisco
2021-06-08 1 San Francisco 3 A San Francisco
2021-06-09 1 New York 4 A San Francisco
使用 window 框架规范怎么样?
select t.*,
max(value) over (order by group_id
range between 1 preceding and 1 preceding
) as prev_value
from t;
如果 group_id
是连续的但有间隙,那么您可以使用 dense_rank()
得到一个有效的:
select t.*,
max(value) over (order by dense_group_id
range between 1 preceding and 1 preceding
) as prev_value
from (select t.*,
dense_rank() over (order by group_id) as dense_group_id
from t
) t
目标是为所有适用行填充上一组的值并包括条件。条件在 STATUS
列中指定。需要针对条件 STATUS = A
.
数据看起来像这样:
DATE ID VALUE GROUP_ID STATUS
2021-06-01 1 New York 1 A
2021-06-02 1 New York 1 A
2021-06-03 1 New York 1 B
2021-06-04 1 New York 1 A
2021-06-05 1 Boston 2 A
2021-06-06 1 Boston 2 A
2021-06-07 1 San Francisco 3 A
2021-06-08 1 San Francisco 3 A
2021-06-09 1 New York 4 A
预期结果: 数据如下所示:
DATE ID VALUE GROUP_ID STATUS PREVIOUS_VALUE
2021-06-01 1 New York 1 A NA
2021-06-02 1 New York 1 A NA
2021-06-03 1 New York 1 B NA
2021-06-04 1 New York 1 A NA
2021-06-05 1 Boston 2 A New York
2021-06-06 1 Boston 2 A New York
2021-06-07 1 San Francisco 3 A Boston
2021-06-08 1 San Francisco 3 A Boston
2021-06-09 1 New York 4 A San Francisco
到目前为止尝试
select *, last_vale(VALUE IGNORE NULLS) OVER (partition by ID, GROUP_ID order by DATE) from (
table)A
select *, lag(VALUE) OVER (partition by ID, GROUP_ID order by DATE) from (
table)A
我有一个备份计划来创建一个 table,它将根据条件保存唯一值,然后可以 运行 UPDATE
基于最接近的较小 GROUP_ID
的语句但宁愿有一个更可持续的解决方案。
谢谢。
第一次修改:排除标题为 'Boston' 的目标 我尝试使用 case when value not in ('Boston') then ...
DATE ID VALUE GROUP_ID STATUS PREVIOUS_VALUE
2021-06-01 1 New York 1 A NA
2021-06-02 1 New York 1 A NA
2021-06-03 1 New York 1 B NA
2021-06-04 1 New York 1 A NA
2021-06-05 1 Boston 2 A New York
2021-06-06 1 Boston 2 A New York
2021-06-07 1 San Francisco 3 A San Francisco
2021-06-08 1 San Francisco 3 A San Francisco
2021-06-09 1 New York 4 A San Francisco
使用 window 框架规范怎么样?
select t.*,
max(value) over (order by group_id
range between 1 preceding and 1 preceding
) as prev_value
from t;
如果 group_id
是连续的但有间隙,那么您可以使用 dense_rank()
得到一个有效的:
select t.*,
max(value) over (order by dense_group_id
range between 1 preceding and 1 preceding
) as prev_value
from (select t.*,
dense_rank() over (order by group_id) as dense_group_id
from t
) t