如何根据行日差和分区对 SQL 中的列进行排名?

How do I Rank column in SQL based on row day-difference and partition?

我正在尝试根据行差异 < 3 在列上获取 RANK()。

select hotel.*,
IFNULL(datediff(visit_date, lag(visit_date)
OVER (partition by hotel_id)), 0) as diff
from hotel;

我得到以下输出,

hotel_id customer_id  visit_date  diff
1            1        2020-01-01    0
1            2        2020-01-03    2
2            1        2020-01-01    0
2            2        2020-01-10    9
2            3        2020-01-14    4
3            1        2020-01-04    0
3            1        2020-01-11    7

I am stuck with the RANK() part.

预期输出: 如果日差小于 3 则 1 否则 2。如果下一个大于 3 天则 3,依此类推

hotel_id customer_id  visit_date  rank
1            1        2020-01-01    1
1            2        2020-01-03    1
2            1        2020-01-01    1
2            2        2020-01-10    2
2            3        2020-01-14    3
3            1        2020-01-04    1
3            1        2020-01-11    2

如果您想要根据您给定的条件得到结果,那么您可以在 SQL 服务器中尝试以下操作。这是 Demo

select
  hotel_id, 
  customer_id, 
  visit_date,
  case 
    when days < 3 then 1
    else 2
  end as rnk
from
(
  select
    *,
    datediff(day, n_date, visit_date) as days
  from
  (
      select
        *,
        coalesce(lag(visit_date) over (partition by hotel_id order by visit_date), visit_date) as n_date

      from hotel
  ) val
)days

您可以使用此查询生成您的 rank 值。它使用几个 CTE,第一个为每次访问生成行号(基于每个酒店),第二个(递归)CTE 生成 rank 值,从第一个 CTE 开始遍历行,并且仅在日期差异超过 2 天时递增 rank

WITH RECURSIVE hotel_rows AS (
  SELECT hotel_id, customer_id, visit_date,
         ROW_NUMBER() OVER (PARTITION BY hotel_id ORDER BY visit_date) AS rn
  FROM hotel
  ORDER BY hotel_id, visit_date
),
ranks AS (
  SELECT hotel_id, customer_id, visit_date, rn, 1 AS `rank`
  FROM hotel_rows
  WHERE rn = 1
  UNION ALL
  SELECT h.hotel_id, h.customer_id, h.visit_date, h.rn,
         r.rank + (h.visit_date > r.visit_date + INTERVAL 2 DAY)
  FROM hotel_rows h
  JOIN ranks r ON h.hotel_id = r.hotel_id
              AND h.rn = r.rn + 1
)
SELECT SELECT hotel_id, customer_id, visit_date, `rank`
FROM ranks
ORDER BY hotel_id, visit_date

输出(对于我稍微扩展的演示):

hotel_id    customer_id     visit_date  rank
1           1               2020-01-01  1
1           2               2020-01-03  1
2           1               2020-01-01  1
2           2               2020-01-10  2
2           3               2020-01-14  3
2           1               2020-01-15  3
2           2               2020-01-20  4
3           1               2020-01-04  1
3           1               2020-01-11  2

Demo on dbfiddle

我会表达为:

select h.*,
       (case when lag(visit_date) over (partition by hotel_id order by visit_date) < visit_date - interval 3 day
             then 2 else 1
       end)
from hotel h;

编辑;

根据你的修改点,你想根据日期差异分配组,然后使用row_number():

select h.*,
       1 + sum( coalesce(visit_date > prev_vd + interval 3 day, 0) ) over (partition by hotel_id order by visit_date) as grp
from (select h.*,
             lag(visit_date) over (partition by hotel_id order by visit_date) as prev_vd
      from hotel h
     ) h;

Here 是一个 db<>fiddle.