获取连续状态的行号并在更改时重置
Get Row Number for Consecutive Status and Reset on Change
所以我希望能够跟踪用户数周内连续登录的次数。我已经尝试 row_number() Over (Partition By State Order by Week) 但是当状态改变时 row_numbers 不会重置。这是一个例子 table.
user_id | week | state
--------+--------------+-------
1 | 2018-01-01 | Active
1 | 2018-01-08 | Inactive
1 | 2018-01-15 | Inactive
1 | 2018-01-22 | Active
1 | 2018-01-29 | Active
2 | 2018-01-01 | Inactive
2 | 2018-01-08 | Active
2 | 2018-01-15 | Inactive
2 | 2018-01-22 | Active
2 | 2018-01-29 | Active
我希望输出看起来像这样:
user_id | week | state | streak
--------+--------------+----------+---------
1000 | 2018-01-01 | Active | 1
1000 | 2018-01-08 | Inactive | 1
1000 | 2018-01-15 | Inactive | 2
1000 | 2018-01-22 | Active | 1
1000 | 2018-01-29 | Active | 2
2000 | 2018-01-01 | Inactive | 1
2000 | 2018-01-08 | Active | 1
2000 | 2018-01-15 | Inactive | 1
2000 | 2018-01-22 | Active | 1
2000 | 2018-01-29 | Active | 2
这是我当前的查询:
SELECT
week,
user_id,
state,
row_number()
OVER(PARTITION BY user_id, state
order by user_id, week) AS streak
FROM
t.data_table
GROUP BY 1,2,3
order by week;
我的输出目前是这样的:
user_id | week | state | streak
--------+--------------+----------+---------
1000 | 2018-01-01 | Active | 1
1000 | 2018-01-08 | Inactive | 1
1000 | 2018-01-15 | Inactive | 2
1000 | 2018-01-22 | Active | 2
1000 | 2018-01-29 | Active | 3
2000 | 2018-01-01 | Inactive | 1
2000 | 2018-01-08 | Active | 1
2000 | 2018-01-15 | Inactive | 2
2000 | 2018-01-22 | Active | 2
2000 | 2018-01-29 | Active | 3
此处的任何建议都会有所帮助。
这是一个空岛问题。该策略是定义具有相似状态的行组,然后使用 row_number()
来枚举它们。
一种方法使用不同的行号:
select t.*,
row_number() over (partition by user_id, status, seqnum - seqnum_s order by week) as streak
from (select t.*,
row_number() over (partition by user_id order by week) as seqnum,
row_number() over (partition by user_id, status order by week) as seqnum_s
from t
) t;
解释它是如何工作的有点棘手。如果查看子查询的结果,您将看到行号的差异如何定义状态相同的每个组。
所以我希望能够跟踪用户数周内连续登录的次数。我已经尝试 row_number() Over (Partition By State Order by Week) 但是当状态改变时 row_numbers 不会重置。这是一个例子 table.
user_id | week | state
--------+--------------+-------
1 | 2018-01-01 | Active
1 | 2018-01-08 | Inactive
1 | 2018-01-15 | Inactive
1 | 2018-01-22 | Active
1 | 2018-01-29 | Active
2 | 2018-01-01 | Inactive
2 | 2018-01-08 | Active
2 | 2018-01-15 | Inactive
2 | 2018-01-22 | Active
2 | 2018-01-29 | Active
我希望输出看起来像这样:
user_id | week | state | streak
--------+--------------+----------+---------
1000 | 2018-01-01 | Active | 1
1000 | 2018-01-08 | Inactive | 1
1000 | 2018-01-15 | Inactive | 2
1000 | 2018-01-22 | Active | 1
1000 | 2018-01-29 | Active | 2
2000 | 2018-01-01 | Inactive | 1
2000 | 2018-01-08 | Active | 1
2000 | 2018-01-15 | Inactive | 1
2000 | 2018-01-22 | Active | 1
2000 | 2018-01-29 | Active | 2
这是我当前的查询:
SELECT
week,
user_id,
state,
row_number()
OVER(PARTITION BY user_id, state
order by user_id, week) AS streak
FROM
t.data_table
GROUP BY 1,2,3
order by week;
我的输出目前是这样的:
user_id | week | state | streak
--------+--------------+----------+---------
1000 | 2018-01-01 | Active | 1
1000 | 2018-01-08 | Inactive | 1
1000 | 2018-01-15 | Inactive | 2
1000 | 2018-01-22 | Active | 2
1000 | 2018-01-29 | Active | 3
2000 | 2018-01-01 | Inactive | 1
2000 | 2018-01-08 | Active | 1
2000 | 2018-01-15 | Inactive | 2
2000 | 2018-01-22 | Active | 2
2000 | 2018-01-29 | Active | 3
此处的任何建议都会有所帮助。
这是一个空岛问题。该策略是定义具有相似状态的行组,然后使用 row_number()
来枚举它们。
一种方法使用不同的行号:
select t.*,
row_number() over (partition by user_id, status, seqnum - seqnum_s order by week) as streak
from (select t.*,
row_number() over (partition by user_id order by week) as seqnum,
row_number() over (partition by user_id, status order by week) as seqnum_s
from t
) t;
解释它是如何工作的有点棘手。如果查看子查询的结果,您将看到行号的差异如何定义状态相同的每个组。