当 `df1$DateTime_1` 在 5 秒间隔内与 `df2$DateTime_2` 匹配时,如何将变量 `df1$DateTime_1` 添加到 `df2`

How to add variable `df1$DateTime_1` to `df2` when `df1$DateTime_1` match within a 5-seconds interval with `df2$DateTime_2`

我有数据帧 df1df2df1 总结了不同的时刻 (df1$Theor.DateTime),理论上,设备在这些时刻向卫星发送信息。我们知道这要归功于变量 df1$Delay,它表示从设备到卫星的不同发射之间的秒间隔。 df2 总结了卫星接收到此设备信息的具体时间 (df2$Real.DateTime)。正如您在下面的示例中看到的,nrow(df2) 小于 nrow(df1),因为一些排放由于不同的原因没有到达卫星。您还可以看到,由于不同的原因,df2$Real.DateTimedf1$Theor.DateTime 并不完全匹配。卫星发射和接收信号总是有延迟的。

options("digits.secs" = 3)
df1 <- data.frame(Theor.DateTime= c("2018-03-22 12:00:00.000","2018-03-22 12:00:30.040","2018-03-22 12:01:15.800","2018-03-22 12:02:15.700","2018-03-22 12:02:45.350","2018-03-22 12:03:15.002","2018-03-22 12:04:00.065","2018-03-22 12:05:15.430","2018-03-22 12:06:00.060","2018-03-22 12:06:45.002"),
                  Delay= c(30,45,60,30,30,45,75,45,45,60))
df1$Theor.DateTime <- as.POSIXct(df1$Theor.DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")

head(df1)
           Theor.DateTime Delay
1 2018-03-22 12:00:00.000    30
2 2018-03-22 12:00:30.039    45
3 2018-03-22 12:01:15.799    60
4 2018-03-22 12:02:15.700    30
5 2018-03-22 12:02:45.349    30
6 2018-03-22 12:03:15.002    45


df2 <- data.frame(Real.DateTime= c("2018-03-22 12:00:02.000","2018-03-22 12:02:20.540","2018-03-22 12:02:42.800","2018-03-22 12:05:18.700","2018-03-22 12:06:33.700"))
df2$Theor.DateTime <- as.POSIXct(df1$Theor.DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")

df2
           Real.DateTime
1 2018-03-22 12:00:02.00
2 2018-03-22 12:02:20.53
3 2018-03-22 12:02:42.79
4 2018-03-22 12:05:18.70
5 2018-03-22 12:06:33.70

我想要的是同时创建一个包含 df1df2 信息的数据框。当 df2$Real.Datetime 在关于 df1$Theor.DateTime 的 5 秒间隔(± 5 秒)内时,我想在同一行 df1$Theor.DateTimedf2$Real.Datetime 中合并。我还想创建一个名为 Reception.success 的列,指示特定 df1$Theor.DateTime 是否与 df2$Real.Datetime 匹配(TRUE 或 FALSE),表明已收到发射。

我希望:

> df3
            Theor.DateTime Delay Reception.success           Real.DateTime
1  2018-03-22 12:00:00.000    30              TRUE 2018-03-22 12:00:02.000
2  2018-03-22 12:00:30.040    45             FALSE                    <NA>
3  2018-03-22 12:01:15.800    60             FALSE                    <NA>
4  2018-03-22 12:02:15.700    30              TRUE 2018-03-22 12:02:20.540
5  2018-03-22 12:02:45.350    30              TRUE 2018-03-22 12:02:42.800
6  2018-03-22 12:03:15.002    45             FALSE                    <NA>
7  2018-03-22 12:04:00.065    75             FALSE                    <NA>
8  2018-03-22 12:05:15.430    45              TRUE 2018-03-22 12:05:18.700
9  2018-03-22 12:06:00.060    45             FALSE                    <NA>
10 2018-03-22 12:06:45.002    60             FALSE                    <NA>

有人知道怎么弄吗?

提前致谢

可以在data.table

中使用Non equi join函数
library(data.table)

options("digits.secs" = 3)
df1 <- data.table(Theor.DateTime= as.POSIXct(c("2018-03-22 12:00:00.000","2018-03-22 12:00:30.040","2018-03-22 12:01:15.800","2018-03-22 12:02:15.700","2018-03-22 12:02:45.350","2018-03-22 12:03:15.002","2018-03-22 12:04:00.065","2018-03-22 12:05:15.430","2018-03-22 12:06:00.060","2018-03-22 12:06:45.002"),format="%Y-%m-%d %H:%M:%OS",tz="UTC"),
                  Delay= c(30,45,60,30,30,45,75,45,45,60))
df2 <- data.table(Real.DateTime= as.POSIXct(c("2018-03-22 12:00:02.000","2018-03-22 12:02:20.540","2018-03-22 12:02:42.800","2018-03-22 12:05:18.700","2018-03-22 12:06:33.700"),format="%Y-%m-%d %H:%M:%OS",tz="UTC"))


df2[,`:=`(minus_5=Real.DateTime-5,
          plus_5=Real.DateTime+5)]


df2
#>             Real.DateTime                minus_5                 plus_5
#> 1: 2018-03-22 12:00:02.00 2018-03-22 11:59:57.00 2018-03-22 12:00:07.00
#> 2: 2018-03-22 12:02:20.53 2018-03-22 12:02:15.53 2018-03-22 12:02:25.53
#> 3: 2018-03-22 12:02:42.79 2018-03-22 12:02:37.79 2018-03-22 12:02:47.79
#> 4: 2018-03-22 12:05:18.70 2018-03-22 12:05:13.70 2018-03-22 12:05:23.70
#> 5: 2018-03-22 12:06:33.70 2018-03-22 12:06:28.70 2018-03-22 12:06:38.70


df1[df2,on = .(Theor.DateTime<=plus_5,Theor.DateTime>=minus_5),"Real.DateTime":=i.Real.DateTime][,"Reception.success":=!is.na(Real.DateTime)]

df1
#>              Theor.DateTime Delay          Real.DateTime Reception.success
#>  1: 2018-03-22 12:00:00.000    30 2018-03-22 12:00:02.00              TRUE
#>  2: 2018-03-22 12:00:30.039    45                   <NA>             FALSE
#>  3: 2018-03-22 12:01:15.799    60                   <NA>             FALSE
#>  4: 2018-03-22 12:02:15.700    30 2018-03-22 12:02:20.53              TRUE
#>  5: 2018-03-22 12:02:45.349    30 2018-03-22 12:02:42.79              TRUE
#>  6: 2018-03-22 12:03:15.002    45                   <NA>             FALSE
#>  7: 2018-03-22 12:04:00.065    75                   <NA>             FALSE
#>  8: 2018-03-22 12:05:15.430    45 2018-03-22 12:05:18.70              TRUE
#>  9: 2018-03-22 12:06:00.059    45                   <NA>             FALSE
#> 10: 2018-03-22 12:06:45.002    60                   <NA>             FALSE

reprex package (v0.3.0)

于 2020-04-14 创建