当 `df1$DateTime_1` 在 5 秒间隔内与 `df2$DateTime_2` 匹配时,如何将变量 `df1$DateTime_1` 添加到 `df2`
How to add variable `df1$DateTime_1` to `df2` when `df1$DateTime_1` match within a 5-seconds interval with `df2$DateTime_2`
我有数据帧 df1
和 df2
。 df1
总结了不同的时刻 (df1$Theor.DateTime
),理论上,设备在这些时刻向卫星发送信息。我们知道这要归功于变量 df1$Delay
,它表示从设备到卫星的不同发射之间的秒间隔。 df2
总结了卫星接收到此设备信息的具体时间 (df2$Real.DateTime
)。正如您在下面的示例中看到的,nrow(df2)
小于 nrow(df1)
,因为一些排放由于不同的原因没有到达卫星。您还可以看到,由于不同的原因,df2$Real.DateTime
与 df1$Theor.DateTime
并不完全匹配。卫星发射和接收信号总是有延迟的。
options("digits.secs" = 3)
df1 <- data.frame(Theor.DateTime= c("2018-03-22 12:00:00.000","2018-03-22 12:00:30.040","2018-03-22 12:01:15.800","2018-03-22 12:02:15.700","2018-03-22 12:02:45.350","2018-03-22 12:03:15.002","2018-03-22 12:04:00.065","2018-03-22 12:05:15.430","2018-03-22 12:06:00.060","2018-03-22 12:06:45.002"),
Delay= c(30,45,60,30,30,45,75,45,45,60))
df1$Theor.DateTime <- as.POSIXct(df1$Theor.DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")
head(df1)
Theor.DateTime Delay
1 2018-03-22 12:00:00.000 30
2 2018-03-22 12:00:30.039 45
3 2018-03-22 12:01:15.799 60
4 2018-03-22 12:02:15.700 30
5 2018-03-22 12:02:45.349 30
6 2018-03-22 12:03:15.002 45
df2 <- data.frame(Real.DateTime= c("2018-03-22 12:00:02.000","2018-03-22 12:02:20.540","2018-03-22 12:02:42.800","2018-03-22 12:05:18.700","2018-03-22 12:06:33.700"))
df2$Theor.DateTime <- as.POSIXct(df1$Theor.DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")
df2
Real.DateTime
1 2018-03-22 12:00:02.00
2 2018-03-22 12:02:20.53
3 2018-03-22 12:02:42.79
4 2018-03-22 12:05:18.70
5 2018-03-22 12:06:33.70
我想要的是同时创建一个包含 df1
和 df2
信息的数据框。当 df2$Real.Datetime
在关于 df1$Theor.DateTime
的 5 秒间隔(± 5 秒)内时,我想在同一行 df1$Theor.DateTime
和 df2$Real.Datetime
中合并。我还想创建一个名为 Reception.success
的列,指示特定 df1$Theor.DateTime
是否与 df2$Real.Datetime
匹配(TRUE 或 FALSE),表明已收到发射。
我希望:
> df3
Theor.DateTime Delay Reception.success Real.DateTime
1 2018-03-22 12:00:00.000 30 TRUE 2018-03-22 12:00:02.000
2 2018-03-22 12:00:30.040 45 FALSE <NA>
3 2018-03-22 12:01:15.800 60 FALSE <NA>
4 2018-03-22 12:02:15.700 30 TRUE 2018-03-22 12:02:20.540
5 2018-03-22 12:02:45.350 30 TRUE 2018-03-22 12:02:42.800
6 2018-03-22 12:03:15.002 45 FALSE <NA>
7 2018-03-22 12:04:00.065 75 FALSE <NA>
8 2018-03-22 12:05:15.430 45 TRUE 2018-03-22 12:05:18.700
9 2018-03-22 12:06:00.060 45 FALSE <NA>
10 2018-03-22 12:06:45.002 60 FALSE <NA>
有人知道怎么弄吗?
提前致谢
可以在data.table
中使用Non equi join
函数
library(data.table)
options("digits.secs" = 3)
df1 <- data.table(Theor.DateTime= as.POSIXct(c("2018-03-22 12:00:00.000","2018-03-22 12:00:30.040","2018-03-22 12:01:15.800","2018-03-22 12:02:15.700","2018-03-22 12:02:45.350","2018-03-22 12:03:15.002","2018-03-22 12:04:00.065","2018-03-22 12:05:15.430","2018-03-22 12:06:00.060","2018-03-22 12:06:45.002"),format="%Y-%m-%d %H:%M:%OS",tz="UTC"),
Delay= c(30,45,60,30,30,45,75,45,45,60))
df2 <- data.table(Real.DateTime= as.POSIXct(c("2018-03-22 12:00:02.000","2018-03-22 12:02:20.540","2018-03-22 12:02:42.800","2018-03-22 12:05:18.700","2018-03-22 12:06:33.700"),format="%Y-%m-%d %H:%M:%OS",tz="UTC"))
df2[,`:=`(minus_5=Real.DateTime-5,
plus_5=Real.DateTime+5)]
df2
#> Real.DateTime minus_5 plus_5
#> 1: 2018-03-22 12:00:02.00 2018-03-22 11:59:57.00 2018-03-22 12:00:07.00
#> 2: 2018-03-22 12:02:20.53 2018-03-22 12:02:15.53 2018-03-22 12:02:25.53
#> 3: 2018-03-22 12:02:42.79 2018-03-22 12:02:37.79 2018-03-22 12:02:47.79
#> 4: 2018-03-22 12:05:18.70 2018-03-22 12:05:13.70 2018-03-22 12:05:23.70
#> 5: 2018-03-22 12:06:33.70 2018-03-22 12:06:28.70 2018-03-22 12:06:38.70
df1[df2,on = .(Theor.DateTime<=plus_5,Theor.DateTime>=minus_5),"Real.DateTime":=i.Real.DateTime][,"Reception.success":=!is.na(Real.DateTime)]
df1
#> Theor.DateTime Delay Real.DateTime Reception.success
#> 1: 2018-03-22 12:00:00.000 30 2018-03-22 12:00:02.00 TRUE
#> 2: 2018-03-22 12:00:30.039 45 <NA> FALSE
#> 3: 2018-03-22 12:01:15.799 60 <NA> FALSE
#> 4: 2018-03-22 12:02:15.700 30 2018-03-22 12:02:20.53 TRUE
#> 5: 2018-03-22 12:02:45.349 30 2018-03-22 12:02:42.79 TRUE
#> 6: 2018-03-22 12:03:15.002 45 <NA> FALSE
#> 7: 2018-03-22 12:04:00.065 75 <NA> FALSE
#> 8: 2018-03-22 12:05:15.430 45 2018-03-22 12:05:18.70 TRUE
#> 9: 2018-03-22 12:06:00.059 45 <NA> FALSE
#> 10: 2018-03-22 12:06:45.002 60 <NA> FALSE
由 reprex package (v0.3.0)
于 2020-04-14 创建
我有数据帧 df1
和 df2
。 df1
总结了不同的时刻 (df1$Theor.DateTime
),理论上,设备在这些时刻向卫星发送信息。我们知道这要归功于变量 df1$Delay
,它表示从设备到卫星的不同发射之间的秒间隔。 df2
总结了卫星接收到此设备信息的具体时间 (df2$Real.DateTime
)。正如您在下面的示例中看到的,nrow(df2)
小于 nrow(df1)
,因为一些排放由于不同的原因没有到达卫星。您还可以看到,由于不同的原因,df2$Real.DateTime
与 df1$Theor.DateTime
并不完全匹配。卫星发射和接收信号总是有延迟的。
options("digits.secs" = 3)
df1 <- data.frame(Theor.DateTime= c("2018-03-22 12:00:00.000","2018-03-22 12:00:30.040","2018-03-22 12:01:15.800","2018-03-22 12:02:15.700","2018-03-22 12:02:45.350","2018-03-22 12:03:15.002","2018-03-22 12:04:00.065","2018-03-22 12:05:15.430","2018-03-22 12:06:00.060","2018-03-22 12:06:45.002"),
Delay= c(30,45,60,30,30,45,75,45,45,60))
df1$Theor.DateTime <- as.POSIXct(df1$Theor.DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")
head(df1)
Theor.DateTime Delay
1 2018-03-22 12:00:00.000 30
2 2018-03-22 12:00:30.039 45
3 2018-03-22 12:01:15.799 60
4 2018-03-22 12:02:15.700 30
5 2018-03-22 12:02:45.349 30
6 2018-03-22 12:03:15.002 45
df2 <- data.frame(Real.DateTime= c("2018-03-22 12:00:02.000","2018-03-22 12:02:20.540","2018-03-22 12:02:42.800","2018-03-22 12:05:18.700","2018-03-22 12:06:33.700"))
df2$Theor.DateTime <- as.POSIXct(df1$Theor.DateTime, format="%Y-%m-%d %H:%M:%OS",tz="UTC")
df2
Real.DateTime
1 2018-03-22 12:00:02.00
2 2018-03-22 12:02:20.53
3 2018-03-22 12:02:42.79
4 2018-03-22 12:05:18.70
5 2018-03-22 12:06:33.70
我想要的是同时创建一个包含 df1
和 df2
信息的数据框。当 df2$Real.Datetime
在关于 df1$Theor.DateTime
的 5 秒间隔(± 5 秒)内时,我想在同一行 df1$Theor.DateTime
和 df2$Real.Datetime
中合并。我还想创建一个名为 Reception.success
的列,指示特定 df1$Theor.DateTime
是否与 df2$Real.Datetime
匹配(TRUE 或 FALSE),表明已收到发射。
我希望:
> df3
Theor.DateTime Delay Reception.success Real.DateTime
1 2018-03-22 12:00:00.000 30 TRUE 2018-03-22 12:00:02.000
2 2018-03-22 12:00:30.040 45 FALSE <NA>
3 2018-03-22 12:01:15.800 60 FALSE <NA>
4 2018-03-22 12:02:15.700 30 TRUE 2018-03-22 12:02:20.540
5 2018-03-22 12:02:45.350 30 TRUE 2018-03-22 12:02:42.800
6 2018-03-22 12:03:15.002 45 FALSE <NA>
7 2018-03-22 12:04:00.065 75 FALSE <NA>
8 2018-03-22 12:05:15.430 45 TRUE 2018-03-22 12:05:18.700
9 2018-03-22 12:06:00.060 45 FALSE <NA>
10 2018-03-22 12:06:45.002 60 FALSE <NA>
有人知道怎么弄吗?
提前致谢
可以在data.table
Non equi join
函数
library(data.table)
options("digits.secs" = 3)
df1 <- data.table(Theor.DateTime= as.POSIXct(c("2018-03-22 12:00:00.000","2018-03-22 12:00:30.040","2018-03-22 12:01:15.800","2018-03-22 12:02:15.700","2018-03-22 12:02:45.350","2018-03-22 12:03:15.002","2018-03-22 12:04:00.065","2018-03-22 12:05:15.430","2018-03-22 12:06:00.060","2018-03-22 12:06:45.002"),format="%Y-%m-%d %H:%M:%OS",tz="UTC"),
Delay= c(30,45,60,30,30,45,75,45,45,60))
df2 <- data.table(Real.DateTime= as.POSIXct(c("2018-03-22 12:00:02.000","2018-03-22 12:02:20.540","2018-03-22 12:02:42.800","2018-03-22 12:05:18.700","2018-03-22 12:06:33.700"),format="%Y-%m-%d %H:%M:%OS",tz="UTC"))
df2[,`:=`(minus_5=Real.DateTime-5,
plus_5=Real.DateTime+5)]
df2
#> Real.DateTime minus_5 plus_5
#> 1: 2018-03-22 12:00:02.00 2018-03-22 11:59:57.00 2018-03-22 12:00:07.00
#> 2: 2018-03-22 12:02:20.53 2018-03-22 12:02:15.53 2018-03-22 12:02:25.53
#> 3: 2018-03-22 12:02:42.79 2018-03-22 12:02:37.79 2018-03-22 12:02:47.79
#> 4: 2018-03-22 12:05:18.70 2018-03-22 12:05:13.70 2018-03-22 12:05:23.70
#> 5: 2018-03-22 12:06:33.70 2018-03-22 12:06:28.70 2018-03-22 12:06:38.70
df1[df2,on = .(Theor.DateTime<=plus_5,Theor.DateTime>=minus_5),"Real.DateTime":=i.Real.DateTime][,"Reception.success":=!is.na(Real.DateTime)]
df1
#> Theor.DateTime Delay Real.DateTime Reception.success
#> 1: 2018-03-22 12:00:00.000 30 2018-03-22 12:00:02.00 TRUE
#> 2: 2018-03-22 12:00:30.039 45 <NA> FALSE
#> 3: 2018-03-22 12:01:15.799 60 <NA> FALSE
#> 4: 2018-03-22 12:02:15.700 30 2018-03-22 12:02:20.53 TRUE
#> 5: 2018-03-22 12:02:45.349 30 2018-03-22 12:02:42.79 TRUE
#> 6: 2018-03-22 12:03:15.002 45 <NA> FALSE
#> 7: 2018-03-22 12:04:00.065 75 <NA> FALSE
#> 8: 2018-03-22 12:05:15.430 45 2018-03-22 12:05:18.70 TRUE
#> 9: 2018-03-22 12:06:00.059 45 <NA> FALSE
#> 10: 2018-03-22 12:06:45.002 60 <NA> FALSE
由 reprex package (v0.3.0)
于 2020-04-14 创建