如何根据行日差和分区对 SQL 中的列进行排名?
How do I Rank column in SQL based on row day-difference and partition?
我正在尝试根据行差异 < 3 在列上获取 RANK()。
select hotel.*,
IFNULL(datediff(visit_date, lag(visit_date)
OVER (partition by hotel_id)), 0) as diff
from hotel;
我得到以下输出,
hotel_id customer_id visit_date diff
1 1 2020-01-01 0
1 2 2020-01-03 2
2 1 2020-01-01 0
2 2 2020-01-10 9
2 3 2020-01-14 4
3 1 2020-01-04 0
3 1 2020-01-11 7
I am stuck with the RANK() part.
预期输出:
如果日差小于 3 则 1 否则 2。如果下一个大于 3 天则 3,依此类推
hotel_id customer_id visit_date rank
1 1 2020-01-01 1
1 2 2020-01-03 1
2 1 2020-01-01 1
2 2 2020-01-10 2
2 3 2020-01-14 3
3 1 2020-01-04 1
3 1 2020-01-11 2
如果您想要根据您给定的条件得到结果,那么您可以在 SQL 服务器中尝试以下操作。这是 Demo
select
hotel_id,
customer_id,
visit_date,
case
when days < 3 then 1
else 2
end as rnk
from
(
select
*,
datediff(day, n_date, visit_date) as days
from
(
select
*,
coalesce(lag(visit_date) over (partition by hotel_id order by visit_date), visit_date) as n_date
from hotel
) val
)days
您可以使用此查询生成您的 rank
值。它使用几个 CTE
,第一个为每次访问生成行号(基于每个酒店),第二个(递归)CTE
生成 rank
值,从第一个 CTE
开始遍历行,并且仅在日期差异超过 2 天时递增 rank
:
WITH RECURSIVE hotel_rows AS (
SELECT hotel_id, customer_id, visit_date,
ROW_NUMBER() OVER (PARTITION BY hotel_id ORDER BY visit_date) AS rn
FROM hotel
ORDER BY hotel_id, visit_date
),
ranks AS (
SELECT hotel_id, customer_id, visit_date, rn, 1 AS `rank`
FROM hotel_rows
WHERE rn = 1
UNION ALL
SELECT h.hotel_id, h.customer_id, h.visit_date, h.rn,
r.rank + (h.visit_date > r.visit_date + INTERVAL 2 DAY)
FROM hotel_rows h
JOIN ranks r ON h.hotel_id = r.hotel_id
AND h.rn = r.rn + 1
)
SELECT SELECT hotel_id, customer_id, visit_date, `rank`
FROM ranks
ORDER BY hotel_id, visit_date
输出(对于我稍微扩展的演示):
hotel_id customer_id visit_date rank
1 1 2020-01-01 1
1 2 2020-01-03 1
2 1 2020-01-01 1
2 2 2020-01-10 2
2 3 2020-01-14 3
2 1 2020-01-15 3
2 2 2020-01-20 4
3 1 2020-01-04 1
3 1 2020-01-11 2
我会表达为:
select h.*,
(case when lag(visit_date) over (partition by hotel_id order by visit_date) < visit_date - interval 3 day
then 2 else 1
end)
from hotel h;
编辑;
根据你的修改点,你想根据日期差异分配组,然后使用row_number()
:
select h.*,
1 + sum( coalesce(visit_date > prev_vd + interval 3 day, 0) ) over (partition by hotel_id order by visit_date) as grp
from (select h.*,
lag(visit_date) over (partition by hotel_id order by visit_date) as prev_vd
from hotel h
) h;
Here 是一个 db<>fiddle.
我正在尝试根据行差异 < 3 在列上获取 RANK()。
select hotel.*,
IFNULL(datediff(visit_date, lag(visit_date)
OVER (partition by hotel_id)), 0) as diff
from hotel;
我得到以下输出,
hotel_id customer_id visit_date diff
1 1 2020-01-01 0
1 2 2020-01-03 2
2 1 2020-01-01 0
2 2 2020-01-10 9
2 3 2020-01-14 4
3 1 2020-01-04 0
3 1 2020-01-11 7
I am stuck with the RANK() part.
预期输出: 如果日差小于 3 则 1 否则 2。如果下一个大于 3 天则 3,依此类推
hotel_id customer_id visit_date rank
1 1 2020-01-01 1
1 2 2020-01-03 1
2 1 2020-01-01 1
2 2 2020-01-10 2
2 3 2020-01-14 3
3 1 2020-01-04 1
3 1 2020-01-11 2
如果您想要根据您给定的条件得到结果,那么您可以在 SQL 服务器中尝试以下操作。这是 Demo
select
hotel_id,
customer_id,
visit_date,
case
when days < 3 then 1
else 2
end as rnk
from
(
select
*,
datediff(day, n_date, visit_date) as days
from
(
select
*,
coalesce(lag(visit_date) over (partition by hotel_id order by visit_date), visit_date) as n_date
from hotel
) val
)days
您可以使用此查询生成您的 rank
值。它使用几个 CTE
,第一个为每次访问生成行号(基于每个酒店),第二个(递归)CTE
生成 rank
值,从第一个 CTE
开始遍历行,并且仅在日期差异超过 2 天时递增 rank
:
WITH RECURSIVE hotel_rows AS (
SELECT hotel_id, customer_id, visit_date,
ROW_NUMBER() OVER (PARTITION BY hotel_id ORDER BY visit_date) AS rn
FROM hotel
ORDER BY hotel_id, visit_date
),
ranks AS (
SELECT hotel_id, customer_id, visit_date, rn, 1 AS `rank`
FROM hotel_rows
WHERE rn = 1
UNION ALL
SELECT h.hotel_id, h.customer_id, h.visit_date, h.rn,
r.rank + (h.visit_date > r.visit_date + INTERVAL 2 DAY)
FROM hotel_rows h
JOIN ranks r ON h.hotel_id = r.hotel_id
AND h.rn = r.rn + 1
)
SELECT SELECT hotel_id, customer_id, visit_date, `rank`
FROM ranks
ORDER BY hotel_id, visit_date
输出(对于我稍微扩展的演示):
hotel_id customer_id visit_date rank
1 1 2020-01-01 1
1 2 2020-01-03 1
2 1 2020-01-01 1
2 2 2020-01-10 2
2 3 2020-01-14 3
2 1 2020-01-15 3
2 2 2020-01-20 4
3 1 2020-01-04 1
3 1 2020-01-11 2
我会表达为:
select h.*,
(case when lag(visit_date) over (partition by hotel_id order by visit_date) < visit_date - interval 3 day
then 2 else 1
end)
from hotel h;
编辑;
根据你的修改点,你想根据日期差异分配组,然后使用row_number()
:
select h.*,
1 + sum( coalesce(visit_date > prev_vd + interval 3 day, 0) ) over (partition by hotel_id order by visit_date) as grp
from (select h.*,
lag(visit_date) over (partition by hotel_id order by visit_date) as prev_vd
from hotel h
) h;
Here 是一个 db<>fiddle.