评估多个选项之间距离一个点的最近距离?

Evaluating the closest distance from one point between multiple options?

我在名为 person_location

的数据框中有一组 longitude/latitude 个点
+----+-----------+-----------+
| id | longitude | latitude  |
+----+-----------+-----------+
|  1 | -76.67707 | 39.399754 |
|  2 | -76.44519 | 39.285084 |
|  3 | -76.69402 |  39.36958 |
|  4 | -76.68936 | 39.369907 |
|  5 | -76.58341 | 39.357994 |
+----+-----------+-----------+

然后我在名为 building_location 的数据框中有另一组经度和纬度点:

+----+------------+-----------+
| id | longitude  | latitude  |
+----+------------+-----------+
|  1 | -76.624393 | 39.246464 |
|  2 | -76.457246 | 39.336996 |
|  3 | -76.711729 | 39.242936 |
|  4 | -76.631249 | 39.289103 |
|  5 | -76.566742 | 39.286271 |
|  6 | -76.683106 |  39.35447 |
|  7 | -76.530232 | 39.332398 |
|  8 | -76.598582 | 39.344642 |
|  9 | -76.691287 | 39.292849 |
+----+------------+-----------+

我想做的是计算 person_location 内的每个 ID,最接近 building_location 内的 ID。我知道如何使用 library(geosphere) 中的 distHaversine 函数计算两个单独点之间的差异,但我如何才能计算出从一个点到一组 多个点的最近距离点数?

使用 dput() 并将结果粘贴到您的问题而不是表格中:

person_location <-
structure(list(id = c(1, 2, 3, 4, 5), longitude = c(-76.67707, 
-76.44519, -76.69402, -76.68936, -76.58341), latitude = c(39.399754, 
39.285084, 39.36958, 39.369907, 39.357994)), class = "data.frame", row.names = c(NA, 
-5L))
building_location <-
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9), longitude = c(-76.624393, 
-76.457246, -76.711729, -76.631249, -76.566742, -76.683106, -76.530232, 
-76.598582, -76.691287), latitude = c(39.246464, 39.336996, 39.242936, 
39.289103, 39.286271, 39.35447, 39.332398, 39.344642, 39.292849
)), class = "data.frame", row.names = c(NA, -9L))

对于每个人,您需要获取到每个建筑物的距离,然后选择最小距离的 id。这是一个简单的函数:

closest <- function(i) {
    idx <- which.min(distHaversine(person_location[i, 2:3], building_location[, 2:3]))  
    building_location[idx, "id"]
}

现在你只需要运行通过所有人:

sapply(seq_len(nrow(person_location)), closest)
# [1] 6 2 6 6 8

如果你只想要离每个人最近的建筑物,而且他们比较近:

library(sf)

## load data here from @dcarlson's dput

person_location <- person_location %>%
  st_as_sf(coords = c('longitude', 'latitude')) %>%
  st_set_crs(4326)

building_location <- building_location %>%
  st_as_sf(coords = c('longitude', 'latitude')) %>%
  st_set_crs(4326)

st_nearest_feature(person_location, building_location)

#although coordinates are longitude/latitude, st_nearest_feature assumes that they #are planar
#[1] 6 2 6 6 8

所以第 1、3 和 4 个人离 6 号楼最近。人 2 -> 建筑物 #2 ...

所有距离都可以用st_distance(person_location, building_location)计算。

您可以使用 nngeo 库轻松找到每个人的最短距离。

library(nngeo)

st_connect(person_location, building_location) %>% st_length()
Calculating nearest IDs
  |===============================================================================================================| 100%
Calculating lines
  |===============================================================================================================| 100%
Done.
Units: [m]
[1] 5054.381 5856.388 1923.254 1796.608 1976.786

用图表更容易理解:

st_connect(person_location, building_location) %>% 
  ggplot() + 
    geom_sf() + 
    geom_sf(data = person_location, color = 'green') + 
    geom_sf(data = building_location, color = 'red')

在地图上更容易:

st_connect(person_location, building_location) %>% 
  mapview::mapview() +
  mapview::mapview(person_location, color = 'green', col.regions = 'green') + 
  mapview::mapview(building_location, color = 'black', col.regions = 'black')

geosphere 可能更准确,但如果您处理的是相对较小的区域,这些工具可能就足够了。我发现它更容易使用,而且通常不需要极高的精度。

另一个解决方案是连接两个 data.frames 并计算每行的距离。这可能比更多的人工作得更快。

library(geosphere)
library(dplyr)


person_location <-
  structure(list(id = c(1, 2, 3, 4, 5), 
                 longitude = c(-76.67707, -76.44519, -76.69402, -76.68936, -76.58341), 
                 latitude = c(39.399754, 39.285084, 39.36958, 39.369907, 39.357994)), 
            class = "data.frame", row.names = c(NA, -5L))
building_location <-
  structure(list(id_building = c(1, 2, 3, 4, 5, 6, 7, 8, 9), 
                 longitude_building = c(-76.624393, -76.457246, -76.711729, -76.631249, -76.566742, -76.683106, -76.530232,  -76.598582, -76.691287), 
                 latitude_building = c(39.246464, 39.336996, 39.242936,39.289103, 39.286271, 39.35447, 39.332398, 39.344642, 39.292849)), 
            class = "data.frame", row.names = c(NA, -9L))

all_locations <- merge(person_location, building_location, by=NULL)

all_locations$distance <- distHaversine( 
  all_locations[, c("longitude", "latitude")],
  all_locations[, c("longitude_building", "latitude_building")]
  )

closest <- all_locations %>% 
  group_by(id) %>% 
  filter( distance == min(distance)  ) %>% 
  ungroup()

Created on 2020-01-07 by the reprex package (v0.3.0)