使用数据 table 的日期非相等连接
Non-equi join of dates using data table
我有一个 table 的编辑数据:
library(data.table)
edits <- data.table(proposal=c('A','A','A'),
editField=c('probability','probability','probability'),
startDate=as.POSIXct(c('2017-04-14 00:00:00','2019-09-06 12:12:00','2018-10-10 15:47:00')),
endDate=as.POSIXct(c('2019-09-06 12:12:00','2018-10-10 15:47:00','9999-12-31 05:00:00')),
value=c(.1,.3,.1))
proposal editField startDate endDate value
1: A probability 2017-04-14 00:00:00 2019-09-06 12:12:00 0.1
2: A probability 2019-09-06 12:12:00 2018-10-10 15:47:00 0.3
3: A probability 2018-10-10 15:47:00 9999-12-31 05:00:00 0.1
我想加入一个数据 table 的事件:
events <- data.table(proposal='A',
editDate=as.POSIXct(c('2017-04-14 00:00:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2018-07-04 15:33:59','2018-07-27 08:01:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-11-26 11:10:00','2019-02-05 13:06:59')),
editField=c('Created','stage','probability','estOrder','estOrder','estOrder','stage','probability','estOrder','estOrder','estOrder'))
proposal editDate editField
1: A 2017-04-14 00:00:00 Created
2: A 2019-09-06 12:12:00 stage
3: A 2019-09-06 12:12:00 probability
4: A 2019-09-06 12:12:00 estOrder
5: A 2018-07-04 15:33:59 estOrder
6: A 2018-07-27 08:01:00 estOrder
7: A 2018-10-10 15:47:00 stage
8: A 2018-10-10 15:47:00 probability
9: A 2018-10-10 15:47:00 estOrder
10: A 2018-11-26 11:10:00 estOrder
11: A 2019-02-05 13:06:59 estOrder
要获得如下所示的输出,其中值指定发生编辑时的概率值:
desired.join <- cbind(events, value=c(.1,.3,.3,.3,.3,.3,.3,.1,.1,.1,.1))
proposal editDate editField value
1: A 2017-04-14 00:00:00 Created 0.1
2: A 2019-09-06 12:12:00 stage 0.3
3: A 2019-09-06 12:12:00 probability 0.3
4: A 2019-09-06 12:12:00 estOrder 0.3
5: A 2018-07-04 15:33:59 estOrder 0.3
6: A 2018-07-27 08:01:00 estOrder 0.3
7: A 2018-10-10 15:47:00 stage 0.3
8: A 2018-10-10 15:47:00 probability 0.1
9: A 2018-10-10 15:47:00 estOrder 0.1
10: A 2018-11-26 11:10:00 estOrder 0.1
11: A 2019-02-05 13:06:59 estOrder 0.1
这是我到目前为止尝试加入两者的内容:
edits[editField=='probability'][events, on=.(proposal, startDate<=editDate, endDate>editDate)]
然而,当我尝试这样做时,我收到一条错误消息,“vecseq 出错(f__,len__,如果(allow.cartesian || notjoin || !anyDuplicated(f__, :
加入 16 行的结果;大于 14 = nrow(x)+nrow(i)。检查 i 中的重复键值,每个键值一遍又一遍地加入 x 中的同一组。如果没问题,请为每个组尝试 by=.EACHI to 运行 j 以避免大量分配。如果您确定要继续,请将 allow.cartesian=TRUE 重新运行。否则,请在 FAQ、Wiki、Stack Overflow 和 data.table 问题跟踪器中搜索此错误消息以获取建议。"
您似乎在尝试加入编辑和事件,以便编辑数据 table 中的概率值与事件数据 table 中的正确观察结果相关联。
看起来错误正在发生,因为用于创建编辑数据的时间间隔 table 并不相互排斥。当我将时间间隔修改为我认为您想要的时,您的代码就会给出您正在寻找的结果。
library(data.table)
edits <- data.table(proposal=c('A','A','A'),
editField=c('probability','probability','probability'),
startDate=as.POSIXct(c('2017-04-14 00:00:00','2018-10-10 15:47:00','2019-09-06 12:12:00')),
endDate=as.POSIXct(c('2018-10-10 15:47:00','2019-09-06 12:12:00','9999-12-31 05:00:00')),
value=c(.1,.3,.1))
events <- data.table(proposal='A',
editDate=as.POSIXct(c('2017-04-14 00:00:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2018-07-04 15:33:59','2018-07-27 08:01:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-11-26 11:10:00','2019-02-05 13:06:59')),
editField=c('Created','stage','probability','estOrder','estOrder','estOrder','stage','probability','estOrder','estOrder','estOrder'))
edits[editField=='probability'][events, on=.(proposal, startDate<=editDate, endDate>editDate)]
或者您可以在不链接的情况下进行连接
edits[events, on=.(proposal, startDate<=editDate, endDate>editDate)]
或者您可以按照 Jonny Phelps 的建议使用 foverlaps,但这也需要编辑数据中的互斥时间间隔 table
events[,startDate:= editDate]
setkey(events, startDate, editDate)
setkey(edits, startDate, endDate)
foverlaps(events, edits, type="any", mult="first")
我有一个 table 的编辑数据:
library(data.table)
edits <- data.table(proposal=c('A','A','A'),
editField=c('probability','probability','probability'),
startDate=as.POSIXct(c('2017-04-14 00:00:00','2019-09-06 12:12:00','2018-10-10 15:47:00')),
endDate=as.POSIXct(c('2019-09-06 12:12:00','2018-10-10 15:47:00','9999-12-31 05:00:00')),
value=c(.1,.3,.1))
proposal editField startDate endDate value
1: A probability 2017-04-14 00:00:00 2019-09-06 12:12:00 0.1
2: A probability 2019-09-06 12:12:00 2018-10-10 15:47:00 0.3
3: A probability 2018-10-10 15:47:00 9999-12-31 05:00:00 0.1
我想加入一个数据 table 的事件:
events <- data.table(proposal='A',
editDate=as.POSIXct(c('2017-04-14 00:00:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2018-07-04 15:33:59','2018-07-27 08:01:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-11-26 11:10:00','2019-02-05 13:06:59')),
editField=c('Created','stage','probability','estOrder','estOrder','estOrder','stage','probability','estOrder','estOrder','estOrder'))
proposal editDate editField
1: A 2017-04-14 00:00:00 Created
2: A 2019-09-06 12:12:00 stage
3: A 2019-09-06 12:12:00 probability
4: A 2019-09-06 12:12:00 estOrder
5: A 2018-07-04 15:33:59 estOrder
6: A 2018-07-27 08:01:00 estOrder
7: A 2018-10-10 15:47:00 stage
8: A 2018-10-10 15:47:00 probability
9: A 2018-10-10 15:47:00 estOrder
10: A 2018-11-26 11:10:00 estOrder
11: A 2019-02-05 13:06:59 estOrder
要获得如下所示的输出,其中值指定发生编辑时的概率值:
desired.join <- cbind(events, value=c(.1,.3,.3,.3,.3,.3,.3,.1,.1,.1,.1))
proposal editDate editField value
1: A 2017-04-14 00:00:00 Created 0.1
2: A 2019-09-06 12:12:00 stage 0.3
3: A 2019-09-06 12:12:00 probability 0.3
4: A 2019-09-06 12:12:00 estOrder 0.3
5: A 2018-07-04 15:33:59 estOrder 0.3
6: A 2018-07-27 08:01:00 estOrder 0.3
7: A 2018-10-10 15:47:00 stage 0.3
8: A 2018-10-10 15:47:00 probability 0.1
9: A 2018-10-10 15:47:00 estOrder 0.1
10: A 2018-11-26 11:10:00 estOrder 0.1
11: A 2019-02-05 13:06:59 estOrder 0.1
这是我到目前为止尝试加入两者的内容:
edits[editField=='probability'][events, on=.(proposal, startDate<=editDate, endDate>editDate)]
然而,当我尝试这样做时,我收到一条错误消息,“vecseq 出错(f__,len__,如果(allow.cartesian || notjoin || !anyDuplicated(f__, : 加入 16 行的结果;大于 14 = nrow(x)+nrow(i)。检查 i 中的重复键值,每个键值一遍又一遍地加入 x 中的同一组。如果没问题,请为每个组尝试 by=.EACHI to 运行 j 以避免大量分配。如果您确定要继续,请将 allow.cartesian=TRUE 重新运行。否则,请在 FAQ、Wiki、Stack Overflow 和 data.table 问题跟踪器中搜索此错误消息以获取建议。"
您似乎在尝试加入编辑和事件,以便编辑数据 table 中的概率值与事件数据 table 中的正确观察结果相关联。
看起来错误正在发生,因为用于创建编辑数据的时间间隔 table 并不相互排斥。当我将时间间隔修改为我认为您想要的时,您的代码就会给出您正在寻找的结果。
library(data.table)
edits <- data.table(proposal=c('A','A','A'),
editField=c('probability','probability','probability'),
startDate=as.POSIXct(c('2017-04-14 00:00:00','2018-10-10 15:47:00','2019-09-06 12:12:00')),
endDate=as.POSIXct(c('2018-10-10 15:47:00','2019-09-06 12:12:00','9999-12-31 05:00:00')),
value=c(.1,.3,.1))
events <- data.table(proposal='A',
editDate=as.POSIXct(c('2017-04-14 00:00:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2019-09-06 12:12:00','2018-07-04 15:33:59','2018-07-27 08:01:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-10-10 15:47:00','2018-11-26 11:10:00','2019-02-05 13:06:59')),
editField=c('Created','stage','probability','estOrder','estOrder','estOrder','stage','probability','estOrder','estOrder','estOrder'))
edits[editField=='probability'][events, on=.(proposal, startDate<=editDate, endDate>editDate)]
或者您可以在不链接的情况下进行连接
edits[events, on=.(proposal, startDate<=editDate, endDate>editDate)]
或者您可以按照 Jonny Phelps 的建议使用 foverlaps,但这也需要编辑数据中的互斥时间间隔 table
events[,startDate:= editDate]
setkey(events, startDate, editDate)
setkey(edits, startDate, endDate)
foverlaps(events, edits, type="any", mult="first")