在 SQL 的特定时间查找访问量最大的地点
Finding the most visited place at a particular time in SQL
我有一个用户的 table,其中包含关于 user_id、用户购票地点以及用户购票时间的信息。
用户:
|------------|-------------|----------------------|
| user_id | place | purchase_time |
|------------|-------------|----------------------|
| 1 | New York | 2021-11-27:17:00:21 |
| 1 | Chicago | 2021-11-25:19:00:21 |
| 1 | Chicago | 2021-11-23:03:00:21 |
| 1 | Washington | 2021-11-21:07:00:21 |
| 1 | Washington | 2021-11-19:12:00:21 |
| 1 | Washington | 2021-11-17:00:00:21 |
| 1 | Washington | 2021-11-15:23:00:21 |
| 1 | Washington | 2021-11-12:21:00:21 |
| 2 | Chicago | 2021-09-25:01:00:21 |
| 2 | Milwaukee | 2021-09-24:02:00:21 |
| 2 | Milwaukee | 2021-09-23:03:00:21 |
| 2 | New York | 2021-09-22:19:00:21 |
| 2 | Chicago | 2021-09-21:01:00:21 |
| 3 | Milwaukee | 2021-10-27:12:31:21 |
| 3 | Washington | 2021-10-24:07:01:23 |
| 3 | Chicago | 2021-10-21:01:78:89 |
|------------|-------------|----------------------|
我想添加一个新列,显示用户购票时最常去的地方。 table 想 (Snowflake):
|------------|-------------|----------------------|---------------------|
| user_id | place | purchase_time | most_visited_place |
|------------|-------------|----------------------|---------------------|
| 1 | New York | 2021-11-27:17:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Chicago | 2021-11-25:19:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Chicago | 2021-11-23:03:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-21:07:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-19:12:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-17:00:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-15:23:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-12:21:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 2 | Chicago | 2021-09-21:01:00:25 | Chicago | <-- tie, break. Both Chicago and Milwaukee were most visited then take the recent most visited
| 2 | Milwaukee | 2021-09-21:02:00:24 | Milwaukee | <--- Milwaukee, because at purchase_time This place was most visited by the user
| 2 | Milwaukee | 2021-09-21:03:00:23 | Milwaukee | <--- Milwaukee, because at purchase_time This place was most visited by the user
| 2 | New York | 2021-09-21:19:00:22 | New York | <-- tie, break. Both Chicago and New York were most visited then take the recent most visited
| 2 | Chicago | 2021-09-21:01:00:21 | Chicago | <--- Chicago, because at purchase_time This place was most visited by the user
| 3 | Milwaukee | 2021-10-27:12:31:21 | Milwaukee |
| 3 | Washington | 2021-10-24:07:01:23 | Washington |
| 3 | Chicago | 2021-10-21:01:78:89 | Chicago |
|------------|-------------|----------------------|---------------------|
您想使用 WINDOW version of COUNT to get the "prior rows count" and then join to all the prior counted rows, and filter out the "best" via a QUALIFY
WITH prior_user AS (
SELECT
user_id,
place,
purchase_time,
COUNT(place) OVER (PARTITION BY user_id, place ORDER BY purchase_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS place_count
FROM users
)
SELECT
u.user_id,
u.place,
u.purchase_time,
p.place AS most_visited_place
FROM users u
JOIN prior_user p
ON u.user_id = p.user_id AND u.purchase_time >= p.purchase_time
QUALIFY row_number() OVER (partition by u.user_id, u.purchase_time ORDER BY place_count DESC, p.purchase_time DESC) = 1
*这个 sql 还没有 运行。
您可以 lateral
加入 Snowflake。 distinct
的使用有点难看,但我认为您可以用它代替 qualify
,甚至可能得到更好的计划。从执行的角度来看,我很好奇这是否等同于其他答案。
select *
from Users u, lateral (
select distinct first_value(place) over ()
order by count(*) desc, max(u2.purchase_time) desc) as most_visited_place
from Users u2
where u2.user_id = u.user_id and u2.purchase_time <= u.purchase_time
group by place
--qualify row_number() over (order by u2.user_id) = 1
) as mr
order by user_id, purchase_time desc
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=02784df13affab8027f7b052ad942d70
我有一个用户的 table,其中包含关于 user_id、用户购票地点以及用户购票时间的信息。
用户:
|------------|-------------|----------------------|
| user_id | place | purchase_time |
|------------|-------------|----------------------|
| 1 | New York | 2021-11-27:17:00:21 |
| 1 | Chicago | 2021-11-25:19:00:21 |
| 1 | Chicago | 2021-11-23:03:00:21 |
| 1 | Washington | 2021-11-21:07:00:21 |
| 1 | Washington | 2021-11-19:12:00:21 |
| 1 | Washington | 2021-11-17:00:00:21 |
| 1 | Washington | 2021-11-15:23:00:21 |
| 1 | Washington | 2021-11-12:21:00:21 |
| 2 | Chicago | 2021-09-25:01:00:21 |
| 2 | Milwaukee | 2021-09-24:02:00:21 |
| 2 | Milwaukee | 2021-09-23:03:00:21 |
| 2 | New York | 2021-09-22:19:00:21 |
| 2 | Chicago | 2021-09-21:01:00:21 |
| 3 | Milwaukee | 2021-10-27:12:31:21 |
| 3 | Washington | 2021-10-24:07:01:23 |
| 3 | Chicago | 2021-10-21:01:78:89 |
|------------|-------------|----------------------|
我想添加一个新列,显示用户购票时最常去的地方。 table 想 (Snowflake):
|------------|-------------|----------------------|---------------------|
| user_id | place | purchase_time | most_visited_place |
|------------|-------------|----------------------|---------------------|
| 1 | New York | 2021-11-27:17:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Chicago | 2021-11-25:19:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Chicago | 2021-11-23:03:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-21:07:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-19:12:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-17:00:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-15:23:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 1 | Washington | 2021-11-12:21:00:21 | Washington | <--- Washington, because at purchase_time This place was most visited by the user
| 2 | Chicago | 2021-09-21:01:00:25 | Chicago | <-- tie, break. Both Chicago and Milwaukee were most visited then take the recent most visited
| 2 | Milwaukee | 2021-09-21:02:00:24 | Milwaukee | <--- Milwaukee, because at purchase_time This place was most visited by the user
| 2 | Milwaukee | 2021-09-21:03:00:23 | Milwaukee | <--- Milwaukee, because at purchase_time This place was most visited by the user
| 2 | New York | 2021-09-21:19:00:22 | New York | <-- tie, break. Both Chicago and New York were most visited then take the recent most visited
| 2 | Chicago | 2021-09-21:01:00:21 | Chicago | <--- Chicago, because at purchase_time This place was most visited by the user
| 3 | Milwaukee | 2021-10-27:12:31:21 | Milwaukee |
| 3 | Washington | 2021-10-24:07:01:23 | Washington |
| 3 | Chicago | 2021-10-21:01:78:89 | Chicago |
|------------|-------------|----------------------|---------------------|
您想使用 WINDOW version of COUNT to get the "prior rows count" and then join to all the prior counted rows, and filter out the "best" via a QUALIFY
WITH prior_user AS (
SELECT
user_id,
place,
purchase_time,
COUNT(place) OVER (PARTITION BY user_id, place ORDER BY purchase_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS place_count
FROM users
)
SELECT
u.user_id,
u.place,
u.purchase_time,
p.place AS most_visited_place
FROM users u
JOIN prior_user p
ON u.user_id = p.user_id AND u.purchase_time >= p.purchase_time
QUALIFY row_number() OVER (partition by u.user_id, u.purchase_time ORDER BY place_count DESC, p.purchase_time DESC) = 1
*这个 sql 还没有 运行。
您可以 lateral
加入 Snowflake。 distinct
的使用有点难看,但我认为您可以用它代替 qualify
,甚至可能得到更好的计划。从执行的角度来看,我很好奇这是否等同于其他答案。
select *
from Users u, lateral (
select distinct first_value(place) over ()
order by count(*) desc, max(u2.purchase_time) desc) as most_visited_place
from Users u2
where u2.user_id = u.user_id and u2.purchase_time <= u.purchase_time
group by place
--qualify row_number() over (order by u2.user_id) = 1
) as mr
order by user_id, purchase_time desc
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=02784df13affab8027f7b052ad942d70