按最近时间合并数据集 'after'
Combine datasets by nearest time 'after'
我想根据 之后的最近时间合并两个数据集,但我不知道如何用 data.table
.[=21= 指定它]
x 数据集为 here,'y' 数据集如下:
#y data.table
structure(list(date.time = structure(c(1551107760, 1551171420,
1551181500, 1551181560, 1551182400, 1551188100, 1551194700, 1551262320,
1551262380, 1551264420, 1551267960, 1551272400, 1551338580, 1551343140,
1551343380, 1551349800, 1551351600, 1551369000, 1551424680, 1551425520,
1551429000, 1551446880), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
id = c(12L, 3L, 3L, 10L, 10L, 11L, 4L, 10L, 3L, 11L, 4L,
12L, 2L, 10L, 12L, 12L, 12L, 12L, 2L, 12L, 2L, 10L)), row.names = c(NA,
-22L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001dc21061ef0>, sorted = "date.time")
如果我指定 roll=nearest
:
require(data.table)
setDT(x)
setDT(y)
x[, reference.x := date.time] #ref to check join
setkey(y, date.time)
setkey(x, date.time)
#join dataframe
results <- x[y, roll="nearest"]
您会看到该函数正常工作,但我希望连接基于 y
之后发生的最近的 x
时间。在结果中,您将看到 reference.x
的日期和时间早于 y
.
的日期和时间
> results
date.time class reference.x id
1: 2019-02-25 15:16:00 8.2 14.8 2019-02-25 15:16:00 12 #on or after
2: 2019-02-26 08:57:00 8.2 14.8 2019-02-26 08:57:00 3 #on or after
3: 2019-02-26 11:45:00 8.2 14.8 2019-02-26 11:45:00 3 #on or after
4: 2019-02-26 11:46:00 8.2 14.8 2019-02-26 11:45:00 10 #before (incorrect)
5: 2019-02-26 12:00:00 8.2 14.8 2019-02-26 11:59:00 10
6: 2019-02-26 13:35:00 8.2 14.8 2019-02-26 13:35:00 11
7: 2019-02-26 15:25:00 8.2 14.8 2019-02-26 15:25:00 4
8: 2019-02-27 10:12:00 8.2 14.8 2019-02-27 07:10:00 10
9: 2019-02-27 10:13:00 8.2 14.8 2019-02-27 07:10:00 3
10: 2019-02-27 10:47:00 8.2 14.8 2019-02-27 07:10:00 11
11: 2019-02-27 11:46:00 8.2 14.8 2019-02-27 07:10:00 4
12: 2019-02-27 13:00:00 8.2 14.8 2019-02-27 07:10:00 12
13: 2019-02-28 07:23:00 8.2 14.8 2019-02-28 07:26:00 2
14: 2019-02-28 08:39:00 8.2 14.8 2019-02-28 08:39:00 10
15: 2019-02-28 08:43:00 8.2 14.8 2019-02-28 08:42:00 12
16: 2019-02-28 10:30:00 8.2 14.8 2019-02-28 10:30:00 12
17: 2019-02-28 11:00:00 8.2 14.8 2019-02-28 10:59:00 12
18: 2019-02-28 15:50:00 8.2 14.8 2019-02-28 15:49:00 12
19: 2019-03-01 07:18:00 8.2 14.8 2019-03-01 07:18:00 2
20: 2019-03-01 07:32:00 8.2 14.8 2019-03-01 07:32:00 12
21: 2019-03-01 08:30:00 8.2 14.8 2019-03-01 08:30:00 2
22: 2019-03-01 13:28:00 8.2 14.8 2019-03-01 13:27:00 10
>
structure(list(date.time = structure(c(1551107760, 1551171420,
1551181500, 1551181560, 1551182400, 1551188100, 1551194700, 1551262320,
1551262380, 1551264420, 1551267960, 1551272400, 1551338580, 1551343140,
1551343380, 1551349800, 1551351600, 1551369000, 1551424680, 1551425520,
1551429000, 1551446880), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
class = c("8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8",
"8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8",
"8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8",
"8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8",
"8.2 14.8", "8.2 14.8", "8.2 14.8"), reference.x = structure(c(1551107760,
1551171420, 1551181500, 1551181500, 1551182340, 1551188100,
1551194700, 1551251400, 1551251400, 1551251400, 1551251400,
1551251400, 1551338760, 1551343140, 1551343320, 1551349800,
1551351540, 1551368940, 1551424680, 1551425520, 1551429000,
1551446820), tzone = "UTC", class = c("POSIXct", "POSIXt"
)), id = c(12L, 3L, 3L, 10L, 10L, 11L, 4L, 10L, 3L, 11L,
4L, 12L, 2L, 10L, 12L, 12L, 12L, 12L, 2L, 12L, 2L, 10L)), sorted = "date.time", class = c("data.table",
"data.frame"), row.names = c(NA, -22L), .internal.selfref = <pointer: 0x000001dc21061ef0>)
如有任何帮助,我们将不胜感激。
您要查找的选项是roll = -Inf
来自文档 (?data.table
):
-Inf
rolls backwards instead; i.e., next observation carried backward (NOCB).
> results <- x[y, roll=-Inf]
> head(results)
date.time class reference.x id
1: 2019-02-25 15:16:00 8.2 14.8 2019-02-25 15:16:00 12
2: 2019-02-26 08:57:00 8.2 14.8 2019-02-26 08:57:00 3
3: 2019-02-26 11:45:00 8.2 14.8 2019-02-26 11:45:00 3
4: 2019-02-26 11:46:00 8.2 14.8 2019-02-26 11:47:00 10
5: 2019-02-26 12:00:00 8.2 14.8 2019-02-26 12:01:00 10
6: 2019-02-26 13:35:00 8.2 14.8 2019-02-26 13:35:00 11
我想根据 之后的最近时间合并两个数据集,但我不知道如何用 data.table
.[=21= 指定它]
x 数据集为 here,'y' 数据集如下:
#y data.table
structure(list(date.time = structure(c(1551107760, 1551171420,
1551181500, 1551181560, 1551182400, 1551188100, 1551194700, 1551262320,
1551262380, 1551264420, 1551267960, 1551272400, 1551338580, 1551343140,
1551343380, 1551349800, 1551351600, 1551369000, 1551424680, 1551425520,
1551429000, 1551446880), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
id = c(12L, 3L, 3L, 10L, 10L, 11L, 4L, 10L, 3L, 11L, 4L,
12L, 2L, 10L, 12L, 12L, 12L, 12L, 2L, 12L, 2L, 10L)), row.names = c(NA,
-22L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001dc21061ef0>, sorted = "date.time")
如果我指定 roll=nearest
:
require(data.table)
setDT(x)
setDT(y)
x[, reference.x := date.time] #ref to check join
setkey(y, date.time)
setkey(x, date.time)
#join dataframe
results <- x[y, roll="nearest"]
您会看到该函数正常工作,但我希望连接基于 y
之后发生的最近的 x
时间。在结果中,您将看到 reference.x
的日期和时间早于 y
.
> results
date.time class reference.x id
1: 2019-02-25 15:16:00 8.2 14.8 2019-02-25 15:16:00 12 #on or after
2: 2019-02-26 08:57:00 8.2 14.8 2019-02-26 08:57:00 3 #on or after
3: 2019-02-26 11:45:00 8.2 14.8 2019-02-26 11:45:00 3 #on or after
4: 2019-02-26 11:46:00 8.2 14.8 2019-02-26 11:45:00 10 #before (incorrect)
5: 2019-02-26 12:00:00 8.2 14.8 2019-02-26 11:59:00 10
6: 2019-02-26 13:35:00 8.2 14.8 2019-02-26 13:35:00 11
7: 2019-02-26 15:25:00 8.2 14.8 2019-02-26 15:25:00 4
8: 2019-02-27 10:12:00 8.2 14.8 2019-02-27 07:10:00 10
9: 2019-02-27 10:13:00 8.2 14.8 2019-02-27 07:10:00 3
10: 2019-02-27 10:47:00 8.2 14.8 2019-02-27 07:10:00 11
11: 2019-02-27 11:46:00 8.2 14.8 2019-02-27 07:10:00 4
12: 2019-02-27 13:00:00 8.2 14.8 2019-02-27 07:10:00 12
13: 2019-02-28 07:23:00 8.2 14.8 2019-02-28 07:26:00 2
14: 2019-02-28 08:39:00 8.2 14.8 2019-02-28 08:39:00 10
15: 2019-02-28 08:43:00 8.2 14.8 2019-02-28 08:42:00 12
16: 2019-02-28 10:30:00 8.2 14.8 2019-02-28 10:30:00 12
17: 2019-02-28 11:00:00 8.2 14.8 2019-02-28 10:59:00 12
18: 2019-02-28 15:50:00 8.2 14.8 2019-02-28 15:49:00 12
19: 2019-03-01 07:18:00 8.2 14.8 2019-03-01 07:18:00 2
20: 2019-03-01 07:32:00 8.2 14.8 2019-03-01 07:32:00 12
21: 2019-03-01 08:30:00 8.2 14.8 2019-03-01 08:30:00 2
22: 2019-03-01 13:28:00 8.2 14.8 2019-03-01 13:27:00 10
>
structure(list(date.time = structure(c(1551107760, 1551171420,
1551181500, 1551181560, 1551182400, 1551188100, 1551194700, 1551262320,
1551262380, 1551264420, 1551267960, 1551272400, 1551338580, 1551343140,
1551343380, 1551349800, 1551351600, 1551369000, 1551424680, 1551425520,
1551429000, 1551446880), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
class = c("8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8",
"8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8",
"8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8",
"8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8",
"8.2 14.8", "8.2 14.8", "8.2 14.8"), reference.x = structure(c(1551107760,
1551171420, 1551181500, 1551181500, 1551182340, 1551188100,
1551194700, 1551251400, 1551251400, 1551251400, 1551251400,
1551251400, 1551338760, 1551343140, 1551343320, 1551349800,
1551351540, 1551368940, 1551424680, 1551425520, 1551429000,
1551446820), tzone = "UTC", class = c("POSIXct", "POSIXt"
)), id = c(12L, 3L, 3L, 10L, 10L, 11L, 4L, 10L, 3L, 11L,
4L, 12L, 2L, 10L, 12L, 12L, 12L, 12L, 2L, 12L, 2L, 10L)), sorted = "date.time", class = c("data.table",
"data.frame"), row.names = c(NA, -22L), .internal.selfref = <pointer: 0x000001dc21061ef0>)
如有任何帮助,我们将不胜感激。
您要查找的选项是roll = -Inf
来自文档 (?data.table
):
-Inf
rolls backwards instead; i.e., next observation carried backward (NOCB).
> results <- x[y, roll=-Inf]
> head(results)
date.time class reference.x id
1: 2019-02-25 15:16:00 8.2 14.8 2019-02-25 15:16:00 12
2: 2019-02-26 08:57:00 8.2 14.8 2019-02-26 08:57:00 3
3: 2019-02-26 11:45:00 8.2 14.8 2019-02-26 11:45:00 3
4: 2019-02-26 11:46:00 8.2 14.8 2019-02-26 11:47:00 10
5: 2019-02-26 12:00:00 8.2 14.8 2019-02-26 12:01:00 10
6: 2019-02-26 13:35:00 8.2 14.8 2019-02-26 13:35:00 11