在 SQL 中查找从一个点到所有其他点的最短地理空间距离

Finding shortest geo-spatial distance from one point to all other points in SQL

从A镇、B镇、C镇或网上购买电影票的用户分为两类。

我有以下 tables 作为:

地点:这table包括电影中心的地点

|--------------|------------|------------|
|    Towns     |  latitude  | longitude  |
|--------------|------------|------------|
|  Town_A      |  72.92629  | -12.89272  |
|  Town_B      |  93.62789  | -83.10172  |
|  Town_C      |  68.92612  | -67.17242  |
|--------------|------------|------------|

用户: 这 table 包含用户的购买历史,即在线或在城镇。还包括用户在购买期间的 latitude/longitude。

|------------|------------|------------|--------------|
|   user_id  |  latitude  | longitude  |    Towns     |
|------------|------------|------------|--------------|
|    1       |  21.89027  | -53.03772  |   Town_A     |
|    1       |  23.87847  | -41.78172  |   Town_C     |
|    1       |  39.62847  | -80.19892  |   online     |
|    1       |  77.87092  | -96.39242  |   Town_A     |
|    2       |  71.87782  | -38.03782  |   online     |
|    2       |  83.37847  | -62.78278  |   Town_B     |
|    3       |  89.81924  | -80.73892  |   Town_B     |
|    3       |  27.87282  | -18.39183  |   Town_A     |
|------------|------------|------------|--------------|

我想根据用户 lat/long 在他购买时找到最近的城镇。最终的 table 如下所示:

|------------|------------|------------|--------------|-----------------|
|   user_id  |  latitude  | longitude  |    Towns     | nearest_town    |
|------------|------------|------------|--------------|-----------------|
|    1       |  21.89027  | -53.03772  |   Town_A     |   Town_B        | <--- Town_B is near based on his lat/long (Irrespective of his purchase town)
|    1       |  23.87847  | -41.78172  |   Town_C     |   Town_A        | <--- Town_A is near based on his lat/long
|    1       |  39.62847  | -80.19892  |   online     |   Town_Online   |
|    1       |  77.87092  | -96.39242  |   Town_A     |   Town_A        |
|    2       |  71.87782  | -38.03782  |   online     |   Town_Online   |
|    2       |  83.37847  | -62.78278  |   Town_B     |   Town_C        |
|    3       |  89.81924  | -80.73892  |   Town_B     |   Town_A        |
|    3       |  27.87282  | -18.39183  |   Town_A     |   Town_A        |
|------------|------------|------------|--------------|-----------------|

SQL查询(雪花)我的尝试:

With specific_location as
(
  select user_id,
         latitude,
     longitude,
     case when Towns in ('Town_A','Town_B','Town_C') then 'Town' else 'Town_Online' end as purchase_in
  from Locations
)
 select *, 
       case when purchase_in = 'Town' then
            (select Towns from Location qualify row_number() over (order by haversine(user.latitude,user.longitude,location.latitude,location.longitude))=1)
            else purchase_in
       end as nearest_town
 from specific_location

我收到一个错误:syntax error unexpected 'when' and unexpected 'else'

您的 CTE specific_location 缺少到 USERS 的 JOIN,因为位置本身没有 user_id 列。

我也会做一个丰富的用户,添加一个序列,以便稍后位置匹配可以明显地针对每个用户行,然后在第二个 CTE 中执行 user/location 连接,因此 select 你最后做的是预先计算的值:

我还为你交换了 IFF 的两个值 CASE 语句

WITH enriched_user AS (
    SLECT 
        u.user_id,
        u.latitude,
        u.longitude,
        u.town,
        seq4() as seq,
        IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
    FROM user AS u
), user_and_closest_location AS (
    SELECT 
        u.user_id,
        u.latitude,
        u.longitude,
        u.town,
        u.purchase_in
        l.town as closest_town
        haversine(u.latitude, u.longitude, l.latitude, l.longitude)
    FROM enriched_user AS u,
        location AS l
    QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1
)
SELECT      
    u.user_id,
    u.latitude,
    u.longitude,
    u.town,
    IFF(u.purchase_in = 'Town', u.closest_town, u.purchase_in) AS nearest_town
FROM user_and_closest_location AS u
ORDER BY 1,2,3; 

计算所有行的基于距离的连接的逻辑是它会更快,如果有一些你不想做的事情,最好修剪那里的输入,但是你将需要重新加入输入以捕获跳过的值。

WITH enriched_user AS (
    SLECT 
        u.user_id,
        u.latitude,
        u.longitude,
        u.town,
        seq4() as seq,
        IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
    FROM user AS u
), user_and_closest_location AS (
    SELECT 
        u.user_id,
        u.latitude,
        u.longitude,
        u.town,
        u.purchase_in
        l.town as closest_town
        haversine(u.latitude, u.longitude, l.latitude, l.longitude)
    FROM enriched_user AS u,
        location AS l
    WHERE u.purchase_in = 'Town'
    QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1
)
SELECT      
    u.user_id
    u.latitude,
    u.longitude,
    u.town,
    IFF(u.purchase_in = 'Town', ucl.closest_town, u.purchase_in) AS nearest_town
FROM enriched_user user_and_closest_location AS u
LEFT JOIN user_and_closest_location AS ucl 
    ON u.seq = ucl.seq
ORDER BY 1,2,3;

城镇中也可以翻转为不“在线”

IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in

成为:

IFF(towns != 'online', 'Town', 'Town_Online')

此时可以将实际测试移至稍后使用的位置。