具有多个日期时间的非等值连接

Non-equi join with multiple datetimes

我有两个数据集,我想根据两个数据集中的日期时间列将它们匹配在一起。我已将两个日期时间都转换为 POSIXct。

第一个数据集(df1)如下:

shark depth temperature   datetime    date      location
A     49.5  26.2   20/03/2018 08:00 20/03/2018    SS04
A     49.5  25.3   20/03/2018 08:02 20/03/2018    SS04
A     53.0  NA     20/03/2018 08:04 20/03/2018    SS04
A     39.5  26.5   20/03/2018 08:50 20/03/2018    Absent
A     43.0  26.2   21/03/2018 09:10 21/03/2018    Absent
A     44.5  NA     21/03/2018 10:18 21/03/2018    SS04 

为了简单起见,我减少了列数,但我的实际数据集有 15 个变量。

第二个数据集tides是一个潮汐时间列表:

date   time  t_depth t_state  t_datetime
18/03/2018 02:33  2.09  High    20/03/2018 02:33
18/03/2018 08:39  0.45   Low    20/03/2018 08:39
18/03/2018 14:47  2.14  High    20/03/2018 14:47
18/03/2018 20:54  0.41   Low    20/03/2018 20:54
19/03/2018 03:01  2.13  High    21/03/2019 03:01
19/03/2018 09:09  0.41   Low    21/03/2019 09:09

我想根据 df1$datetime 是否在那个潮汐周期的 tides$t_datetime 内,将 t_state 添加到 df1。我还想添加对应于该潮汐状态的 t_depth

我是 data.table 的新手,对语法感到很困惑。我试图用

做到这一点
df1[ copy(tides)t_state := i.t_state, 
     on = .( datetime >= t_datetime, datetime < end)]

这不起作用,但我不确定如何解决这个问题。

理想情况下我的输出是:

shark depth temperature   datetime    date    location t_state t_depth
A     49.5  26.2   20/03/2018 08:00 20/03/2018  SS04     High  2.09
A     49.5  25.3   20/03/2018 08:02 20/03/2018  SS04     High  2.09
A     53.0  NA     20/03/2018 08:04 20/03/2018  SS04     High  2.09
A     39.5  26.5   20/03/2018 08:50 20/03/2018  Absent   Low   0.45
A     43.0  26.2   20/03/2018 09:10 21/03/2018  Absent   Low   0.45
A     44.5  NA     20/03/2018 10:18 21/03/2018  SS04     Low   0.45

如果可能的话,我还想知道如何添加我为简单起见而省略的额外变量,是否需要添加任何内容来说明这些?

谢谢!

通过输入的数据:

structure(list(shark = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor"), 
    depth = c(49.5, 49.5, 53, 39.5, 43, 44.5), temperature = c(26.2, 
    25.3, NA, 26.5, 26.2, NA), datetime = structure(1:6, .Label = c("20/03/2018 08:00", 
    "20/03/2018 08:02", "20/03/2018 08:04", "20/03/2018 08:50", 
    "21/03/2018 09:10", "21/03/2018 10:18"), class = "factor"), 
    date = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("20/03/2018", 
    "21/03/2018"), class = "factor"), location = structure(c(2L, 
    2L, 2L, 1L, 1L, 2L), .Label = c("Absent", "SS04"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

structure(list(date = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("18/03/2018", 
"19/03/2018"), class = "factor"), time = structure(c(1L, 3L, 
4L, 5L, 2L, 2L), .Label = c("02:33", "03:01", "08:39", "14:47", 
"20:54"), class = "factor"), t_depth = c(2.09, 0.45, 2.14, 0.41, 
2.13, 0.41), t_state = structure(c(1L, 2L, 1L, 2L, 1L, 2L), .Label = c("High", 
"Low"), class = "factor"), t_datetime = structure(c(2L, 3L, 1L, 
4L, 5L, 6L), .Label = c(" 20/03/2018 14:47", "20/03/2018 02:33", 
"20/03/2018 08:39", "20/03/2018 20:54", "21/03/2019 03:01", "21/03/2019 09:09"
), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))
library( data.table )

#create posix-timestamp
setDT(df1)[, timestamp := as.POSIXct( datetime, format = "%d/%m/%Y %H:%M" )]
#create start and end of tidal period
setDT(tides)[, start := as.POSIXct( t_datetime, format = "%d/%m/%Y %H:%M" )]
tides[, end := shift( start, type = "lead" )]
#left update non-equi join
#left update non-equi join
df1[tides, tide:=i.t_state, on=.(timestamp>=start,timestamp<end)][,timestamp:=NULL]

   shark depth temperature         datetime       date location tide
1:     A  49.5        26.2 20/03/2018 08:00 20/03/2018     SS04 High
2:     A  49.5        25.3 20/03/2018 08:02 20/03/2018     SS04 High
3:     A  53.0          NA 20/03/2018 08:04 20/03/2018     SS04 High
4:     A  39.5        26.5 20/03/2018 08:50 20/03/2018   Absent  Low
5:     A  43.0        26.2 21/03/2018 09:10 21/03/2018   Absent  Low
6:     A  44.5          NA 21/03/2018 10:18 21/03/2018     SS04  Low

评论更新

df1[tides, `:=`(tide=i.t_state, depth = i.t_depth), on=.(timestamp>=start,timestamp<end)][,timestamp:=NULL][]