在 SQL 中查找从一个点到所有其他点的最短地理空间距离
Finding shortest geo-spatial distance from one point to all other points in SQL
从A镇、B镇、C镇或网上购买电影票的用户分为两类。
我有以下 tables 作为:
地点:这table包括电影中心的地点
|--------------|------------|------------|
| Towns | latitude | longitude |
|--------------|------------|------------|
| Town_A | 72.92629 | -12.89272 |
| Town_B | 93.62789 | -83.10172 |
| Town_C | 68.92612 | -67.17242 |
|--------------|------------|------------|
用户: 这 table 包含用户的购买历史,即在线或在城镇。还包括用户在购买期间的 latitude/longitude。
|------------|------------|------------|--------------|
| user_id | latitude | longitude | Towns |
|------------|------------|------------|--------------|
| 1 | 21.89027 | -53.03772 | Town_A |
| 1 | 23.87847 | -41.78172 | Town_C |
| 1 | 39.62847 | -80.19892 | online |
| 1 | 77.87092 | -96.39242 | Town_A |
| 2 | 71.87782 | -38.03782 | online |
| 2 | 83.37847 | -62.78278 | Town_B |
| 3 | 89.81924 | -80.73892 | Town_B |
| 3 | 27.87282 | -18.39183 | Town_A |
|------------|------------|------------|--------------|
我想根据用户 lat/long 在他购买时找到最近的城镇。最终的 table 如下所示:
|------------|------------|------------|--------------|-----------------|
| user_id | latitude | longitude | Towns | nearest_town |
|------------|------------|------------|--------------|-----------------|
| 1 | 21.89027 | -53.03772 | Town_A | Town_B | <--- Town_B is near based on his lat/long (Irrespective of his purchase town)
| 1 | 23.87847 | -41.78172 | Town_C | Town_A | <--- Town_A is near based on his lat/long
| 1 | 39.62847 | -80.19892 | online | Town_Online |
| 1 | 77.87092 | -96.39242 | Town_A | Town_A |
| 2 | 71.87782 | -38.03782 | online | Town_Online |
| 2 | 83.37847 | -62.78278 | Town_B | Town_C |
| 3 | 89.81924 | -80.73892 | Town_B | Town_A |
| 3 | 27.87282 | -18.39183 | Town_A | Town_A |
|------------|------------|------------|--------------|-----------------|
SQL查询(雪花)我的尝试:
With specific_location as
(
select user_id,
latitude,
longitude,
case when Towns in ('Town_A','Town_B','Town_C') then 'Town' else 'Town_Online' end as purchase_in
from Locations
)
select *,
case when purchase_in = 'Town' then
(select Towns from Location qualify row_number() over (order by haversine(user.latitude,user.longitude,location.latitude,location.longitude))=1)
else purchase_in
end as nearest_town
from specific_location
我收到一个错误:syntax error unexpected 'when' and unexpected 'else'
您的 CTE specific_location
缺少到 USERS
的 JOIN,因为位置本身没有 user_id
列。
我也会做一个丰富的用户,添加一个序列,以便稍后位置匹配可以明显地针对每个用户行,然后在第二个 CTE 中执行 user/location 连接,因此 select 你最后做的是预先计算的值:
我还为你交换了 IFF 的两个值 CASE 语句
WITH enriched_user AS (
SLECT
u.user_id,
u.latitude,
u.longitude,
u.town,
seq4() as seq,
IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
FROM user AS u
), user_and_closest_location AS (
SELECT
u.user_id,
u.latitude,
u.longitude,
u.town,
u.purchase_in
l.town as closest_town
haversine(u.latitude, u.longitude, l.latitude, l.longitude)
FROM enriched_user AS u,
location AS l
QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1
)
SELECT
u.user_id,
u.latitude,
u.longitude,
u.town,
IFF(u.purchase_in = 'Town', u.closest_town, u.purchase_in) AS nearest_town
FROM user_and_closest_location AS u
ORDER BY 1,2,3;
计算所有行的基于距离的连接的逻辑是它会更快,如果有一些你不想做的事情,最好修剪那里的输入,但是你将需要重新加入输入以捕获跳过的值。
WITH enriched_user AS (
SLECT
u.user_id,
u.latitude,
u.longitude,
u.town,
seq4() as seq,
IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
FROM user AS u
), user_and_closest_location AS (
SELECT
u.user_id,
u.latitude,
u.longitude,
u.town,
u.purchase_in
l.town as closest_town
haversine(u.latitude, u.longitude, l.latitude, l.longitude)
FROM enriched_user AS u,
location AS l
WHERE u.purchase_in = 'Town'
QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1
)
SELECT
u.user_id
u.latitude,
u.longitude,
u.town,
IFF(u.purchase_in = 'Town', ucl.closest_town, u.purchase_in) AS nearest_town
FROM enriched_user user_and_closest_location AS u
LEFT JOIN user_and_closest_location AS ucl
ON u.seq = ucl.seq
ORDER BY 1,2,3;
城镇中也可以翻转为不“在线”
IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
成为:
IFF(towns != 'online', 'Town', 'Town_Online')
此时可以将实际测试移至稍后使用的位置。
从A镇、B镇、C镇或网上购买电影票的用户分为两类。
我有以下 tables 作为:
地点:这table包括电影中心的地点
|--------------|------------|------------|
| Towns | latitude | longitude |
|--------------|------------|------------|
| Town_A | 72.92629 | -12.89272 |
| Town_B | 93.62789 | -83.10172 |
| Town_C | 68.92612 | -67.17242 |
|--------------|------------|------------|
用户: 这 table 包含用户的购买历史,即在线或在城镇。还包括用户在购买期间的 latitude/longitude。
|------------|------------|------------|--------------|
| user_id | latitude | longitude | Towns |
|------------|------------|------------|--------------|
| 1 | 21.89027 | -53.03772 | Town_A |
| 1 | 23.87847 | -41.78172 | Town_C |
| 1 | 39.62847 | -80.19892 | online |
| 1 | 77.87092 | -96.39242 | Town_A |
| 2 | 71.87782 | -38.03782 | online |
| 2 | 83.37847 | -62.78278 | Town_B |
| 3 | 89.81924 | -80.73892 | Town_B |
| 3 | 27.87282 | -18.39183 | Town_A |
|------------|------------|------------|--------------|
我想根据用户 lat/long 在他购买时找到最近的城镇。最终的 table 如下所示:
|------------|------------|------------|--------------|-----------------|
| user_id | latitude | longitude | Towns | nearest_town |
|------------|------------|------------|--------------|-----------------|
| 1 | 21.89027 | -53.03772 | Town_A | Town_B | <--- Town_B is near based on his lat/long (Irrespective of his purchase town)
| 1 | 23.87847 | -41.78172 | Town_C | Town_A | <--- Town_A is near based on his lat/long
| 1 | 39.62847 | -80.19892 | online | Town_Online |
| 1 | 77.87092 | -96.39242 | Town_A | Town_A |
| 2 | 71.87782 | -38.03782 | online | Town_Online |
| 2 | 83.37847 | -62.78278 | Town_B | Town_C |
| 3 | 89.81924 | -80.73892 | Town_B | Town_A |
| 3 | 27.87282 | -18.39183 | Town_A | Town_A |
|------------|------------|------------|--------------|-----------------|
SQL查询(雪花)我的尝试:
With specific_location as
(
select user_id,
latitude,
longitude,
case when Towns in ('Town_A','Town_B','Town_C') then 'Town' else 'Town_Online' end as purchase_in
from Locations
)
select *,
case when purchase_in = 'Town' then
(select Towns from Location qualify row_number() over (order by haversine(user.latitude,user.longitude,location.latitude,location.longitude))=1)
else purchase_in
end as nearest_town
from specific_location
我收到一个错误:syntax error unexpected 'when' and unexpected 'else'
您的 CTE specific_location
缺少到 USERS
的 JOIN,因为位置本身没有 user_id
列。
我也会做一个丰富的用户,添加一个序列,以便稍后位置匹配可以明显地针对每个用户行,然后在第二个 CTE 中执行 user/location 连接,因此 select 你最后做的是预先计算的值:
我还为你交换了 IFF 的两个值 CASE 语句
WITH enriched_user AS (
SLECT
u.user_id,
u.latitude,
u.longitude,
u.town,
seq4() as seq,
IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
FROM user AS u
), user_and_closest_location AS (
SELECT
u.user_id,
u.latitude,
u.longitude,
u.town,
u.purchase_in
l.town as closest_town
haversine(u.latitude, u.longitude, l.latitude, l.longitude)
FROM enriched_user AS u,
location AS l
QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1
)
SELECT
u.user_id,
u.latitude,
u.longitude,
u.town,
IFF(u.purchase_in = 'Town', u.closest_town, u.purchase_in) AS nearest_town
FROM user_and_closest_location AS u
ORDER BY 1,2,3;
计算所有行的基于距离的连接的逻辑是它会更快,如果有一些你不想做的事情,最好修剪那里的输入,但是你将需要重新加入输入以捕获跳过的值。
WITH enriched_user AS (
SLECT
u.user_id,
u.latitude,
u.longitude,
u.town,
seq4() as seq,
IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
FROM user AS u
), user_and_closest_location AS (
SELECT
u.user_id,
u.latitude,
u.longitude,
u.town,
u.purchase_in
l.town as closest_town
haversine(u.latitude, u.longitude, l.latitude, l.longitude)
FROM enriched_user AS u,
location AS l
WHERE u.purchase_in = 'Town'
QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1
)
SELECT
u.user_id
u.latitude,
u.longitude,
u.town,
IFF(u.purchase_in = 'Town', ucl.closest_town, u.purchase_in) AS nearest_town
FROM enriched_user user_and_closest_location AS u
LEFT JOIN user_and_closest_location AS ucl
ON u.seq = ucl.seq
ORDER BY 1,2,3;
城镇中也可以翻转为不“在线”
IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
成为:
IFF(towns != 'online', 'Town', 'Town_Online')
此时可以将实际测试移至稍后使用的位置。