使用距离(地理圈)和 difftime 从 lon/lat 和时间戳计算平均速度

Calculating average speed from lon/lat and timestamp using distance (geosphere) and difftime

我正在尝试使用 geosphere 中的 distm 函数计算两个连续实例(行)之间的半正弦距离。最后,我想用以米为单位的距离除以以秒为单位的时差来计算平均速度。

这是我计算时差的方法(以秒为单位)

df$Timediff_secs <- 
  with(df, 
       difftime(Timestamp, ave(Timestamp, ID, FUN=lag), units='secs'))

之前有人问过 similar question,答案确实有效,但我需要按 ID 编制索引,以便每个新 ID 都以 NA 开头。我想创建一个名为 df$Distance 的新列。

这需要进行编辑,以便它按 ID 进行索引,并且第一行是 NA(因为要计算的距离没有差异)

library(geosphere)
metersPerMile <- 1609.34
pts <- df1[c("lon", "lat")]

## Pass in two derived data.frames that are lagged by one point
segDists <- distVincentyEllipsoid(p1 = pts[-nrow(df),], 
                                  p2 = pts[-1,])
sum(segDists)/metersPerMile
# [1] 1013.919

这是我从 link

复制的一些示例数据
> df
          Timestamp      ID      lat       lon
2012-11-12 02:08:41      1  76.57169 -110.8070
2012-11-12 02:09:41      1  76.44325 -110.7525
2012-11-12 02:10:41      1  76.90897 -110.8613
2012-11-12 03:18:41      2  76.11152 -110.2037
2012-11-12 03:19:41      2  76.29013 -110.3838
2012-11-12 03:20:41      2  76.15544 -110.4506

感觉我什么都试过了,非常感谢任何帮助!

dplyr::lagdata.table::shift 分组对此很方便,尽管它可以在基础上手动完成,例如 c(NA, variable[-length(variable)])aggregate:

library(dplyr)

df <- structure(list(Timestamp = structure(c(1352704121, 1352704181, 1352704241, 1352708321, 1352708381, 1352708441), 
                                           class = c("POSIXct", "POSIXt"), tzone = ""), 
                     ID = c(1L, 1L, 1L, 2L, 2L, 2L), 
                     lat = c(76.57169, 76.44325, 76.90897, 76.11152, 76.29013, 76.15544), 
                     lon = c(-110.807, -110.7525, -110.8613, -110.2037, -110.3838, -110.4506)), 
                class = "data.frame", .Names = c("Timestamp", "ID", "lat", "lon"), row.names = c(NA, -6L))

df <- df %>% 
    group_by(ID) %>%
    mutate(dist_m = geosphere::distVincentyEllipsoid(cbind(lon, lat), 
                                                     cbind(lag(lon), lag(lat))), 
           time_s = difftime(Timestamp, lag(Timestamp), units = 'secs'), 
           speed_m_per_s = dist_m / as.integer(time_s))

df
#> # A tibble: 6 x 7
#> # Groups:   ID [2]
#>             Timestamp    ID      lat       lon   dist_m  time_s speed_m_per_s
#>                <dttm> <int>    <dbl>     <dbl>    <dbl>  <time>         <dbl>
#> 1 2012-11-12 02:08:41     1 76.57169 -110.8070       NA NA secs            NA
#> 2 2012-11-12 02:09:41     1 76.44325 -110.7525 14408.23 60 secs      240.1371
#> 3 2012-11-12 02:10:41     1 76.90897 -110.8613 52065.53 60 secs      867.7588
#> 4 2012-11-12 03:18:41     2 76.11152 -110.2037       NA NA secs            NA
#> 5 2012-11-12 03:19:41     2 76.29013 -110.3838 20507.15 60 secs      341.7859
#> 6 2012-11-12 03:20:41     2 76.15544 -110.4506 15140.03 60 secs      252.3338

由于data.frame已经分组,聚合只需要求和:

df_avg <- df %>% 
    summarise(dist_m = sum(dist_m, na.rm = TRUE), 
              time_s = sum(as.integer(time_s), na.rm = TRUE), 
              speed_m_per_s = dist_m / time_s)

df_avg
#> # A tibble: 2 x 4
#>      ID   dist_m time_s speed_m_per_s
#>   <int>    <dbl>  <int>         <dbl>
#> 1     1 66473.76    120      553.9480
#> 2     2 35647.18    120      297.0598

单位是米每秒;随意转换。

如果您倾向于使用 data.table,方法如下:

df[, Timestamp := parse_datetime(Timestamp)]
df[, distance := distVincentyEllipsoid(p1 = cbind(lon, lat), 
                                       p2 = cbind(shift(lon), shift(lat))), 
   by = ID]
output <- df[, .(time_diff = as.numeric(Timestamp[.N] - Timestamp[1], unit = "secs") ,
                 tot_distance = sum(distance, na.rm = TRUE)), by = ID]
output[, avg_speed := tot_distance /time_diff]
##    ID time_diff tot_distance avg_speed
## 1:  1       120     66473.26  553.9438
## 2:  2       120     35646.55  297.0546