R 使用 data.table 来削减包含 2 个或更多变量的固定时间间隔
R using data.table to cut fix time interval that contain 2 or more variables
我有一个数据框
df <- data.frame(time = c("2015-09-07 00:32:19", "2015-09-07 01:02:30", "2015-09-07 01:31:36", "2015-09-07 01:47:45",
"2015-09-07 02:00:17", "2015-09-07 02:07:30", "2015-09-07 03:39:41", "2015-09-07 04:04:21", "2015-09-07 04:04:21", "2015-09-07 04:04:22"),
inOut = c("IN", "OUT", "IN", "IN", "IN", "IN", "IN", "OUT", "IN", "OUT"))
> df
time inOut
1 2015-09-07 00:32:19 IN
2 2015-09-07 01:02:30 OUT
3 2015-09-07 01:31:36 IN
4 2015-09-07 01:47:45 IN
5 2015-09-07 02:00:17 IN
6 2015-09-07 02:07:30 IN
7 2015-09-07 03:39:41 IN
8 2015-09-07 04:04:21 OUT
9 2015-09-07 04:04:21 IN
10 2015-09-07 04:04:22 OUT
>
我想计算每 15 分钟 IN/OUT 的计数,
我可以通过创建另一个 in_df、out_df 来做到这一点,每 15 分钟切割一次这些数据帧,然后将它们合并在一起以获得我的结果。 outdf 是我的预期结果。
in_df <- df[which(df$inOut== "IN"),]
out_df <- df[which(df$inOut== "OUT"),]
a <- data.frame(table(cut(as.POSIXct(in_df$time), breaks="15 mins")))
b <- data.frame(table(cut(as.POSIXct(out_df$time), breaks="15 mins")))
colnames(b) <- c("Time", "Out")
colnames(a) <- c("Time", "In")
outdf <- merge(a,b, all=TRUE)
outdf[is.na(outdf)] <- 0
> outdf
Time In Out
1 2015-09-07 00:32:00 1 0
2 2015-09-07 00:47:00 0 0
3 2015-09-07 01:02:00 0 1
4 2015-09-07 01:17:00 1 0
5 2015-09-07 01:32:00 0 0
6 2015-09-07 01:47:00 2 0
7 2015-09-07 02:02:00 1 0
8 2015-09-07 02:17:00 0 0
9 2015-09-07 02:32:00 0 0
10 2015-09-07 02:47:00 0 0
11 2015-09-07 03:02:00 0 0
12 2015-09-07 03:17:00 0 0
13 2015-09-07 03:32:00 1 0
14 2015-09-07 03:47:00 0 0
15 2015-09-07 04:02:00 1 2
我的问题是如何使用 data.table 来获得相同的结果?
在data.table,我会
library(data.table)
setDT(df)
df[, timeCut := cut(as.POSIXct(time), breaks="15 mins")]
df[J(timeCut = levels(timeCut)),
as.list(table(inOut)),
on = "timeCut",
by = .EACHI]
给出:
timeCut IN OUT
1: 2015-09-07 00:32:00 1 0
2: 2015-09-07 00:47:00 0 0
3: 2015-09-07 01:02:00 0 1
4: 2015-09-07 01:17:00 1 0
5: 2015-09-07 01:32:00 0 0
6: 2015-09-07 01:47:00 2 0
7: 2015-09-07 02:02:00 1 0
8: 2015-09-07 02:17:00 0 0
9: 2015-09-07 02:32:00 0 0
10: 2015-09-07 02:47:00 0 0
11: 2015-09-07 03:02:00 0 0
12: 2015-09-07 03:17:00 0 0
13: 2015-09-07 03:32:00 1 0
14: 2015-09-07 03:47:00 0 0
15: 2015-09-07 04:02:00 1 2
解释最后一部分像DT[i=J(x=my_x), j, on="x", by=.EACHI]
,可以读作:
- 在
my_x
上加入 DT
列 x
。
- 然后对
my_x
确定的每个子集执行 j
。
在这种情况下,j=as.list(table(inOut))
。必须将 table 强制转换为列表以创建多个列(inOut
的每个级别一个)。
我有一个数据框
df <- data.frame(time = c("2015-09-07 00:32:19", "2015-09-07 01:02:30", "2015-09-07 01:31:36", "2015-09-07 01:47:45",
"2015-09-07 02:00:17", "2015-09-07 02:07:30", "2015-09-07 03:39:41", "2015-09-07 04:04:21", "2015-09-07 04:04:21", "2015-09-07 04:04:22"),
inOut = c("IN", "OUT", "IN", "IN", "IN", "IN", "IN", "OUT", "IN", "OUT"))
> df
time inOut
1 2015-09-07 00:32:19 IN
2 2015-09-07 01:02:30 OUT
3 2015-09-07 01:31:36 IN
4 2015-09-07 01:47:45 IN
5 2015-09-07 02:00:17 IN
6 2015-09-07 02:07:30 IN
7 2015-09-07 03:39:41 IN
8 2015-09-07 04:04:21 OUT
9 2015-09-07 04:04:21 IN
10 2015-09-07 04:04:22 OUT
>
我想计算每 15 分钟 IN/OUT 的计数, 我可以通过创建另一个 in_df、out_df 来做到这一点,每 15 分钟切割一次这些数据帧,然后将它们合并在一起以获得我的结果。 outdf 是我的预期结果。
in_df <- df[which(df$inOut== "IN"),]
out_df <- df[which(df$inOut== "OUT"),]
a <- data.frame(table(cut(as.POSIXct(in_df$time), breaks="15 mins")))
b <- data.frame(table(cut(as.POSIXct(out_df$time), breaks="15 mins")))
colnames(b) <- c("Time", "Out")
colnames(a) <- c("Time", "In")
outdf <- merge(a,b, all=TRUE)
outdf[is.na(outdf)] <- 0
> outdf
Time In Out
1 2015-09-07 00:32:00 1 0
2 2015-09-07 00:47:00 0 0
3 2015-09-07 01:02:00 0 1
4 2015-09-07 01:17:00 1 0
5 2015-09-07 01:32:00 0 0
6 2015-09-07 01:47:00 2 0
7 2015-09-07 02:02:00 1 0
8 2015-09-07 02:17:00 0 0
9 2015-09-07 02:32:00 0 0
10 2015-09-07 02:47:00 0 0
11 2015-09-07 03:02:00 0 0
12 2015-09-07 03:17:00 0 0
13 2015-09-07 03:32:00 1 0
14 2015-09-07 03:47:00 0 0
15 2015-09-07 04:02:00 1 2
我的问题是如何使用 data.table 来获得相同的结果?
在data.table,我会
library(data.table)
setDT(df)
df[, timeCut := cut(as.POSIXct(time), breaks="15 mins")]
df[J(timeCut = levels(timeCut)),
as.list(table(inOut)),
on = "timeCut",
by = .EACHI]
给出:
timeCut IN OUT
1: 2015-09-07 00:32:00 1 0
2: 2015-09-07 00:47:00 0 0
3: 2015-09-07 01:02:00 0 1
4: 2015-09-07 01:17:00 1 0
5: 2015-09-07 01:32:00 0 0
6: 2015-09-07 01:47:00 2 0
7: 2015-09-07 02:02:00 1 0
8: 2015-09-07 02:17:00 0 0
9: 2015-09-07 02:32:00 0 0
10: 2015-09-07 02:47:00 0 0
11: 2015-09-07 03:02:00 0 0
12: 2015-09-07 03:17:00 0 0
13: 2015-09-07 03:32:00 1 0
14: 2015-09-07 03:47:00 0 0
15: 2015-09-07 04:02:00 1 2
解释最后一部分像DT[i=J(x=my_x), j, on="x", by=.EACHI]
,可以读作:
- 在
my_x
上加入DT
列x
。 - 然后对
my_x
确定的每个子集执行j
。
在这种情况下,j=as.list(table(inOut))
。必须将 table 强制转换为列表以创建多个列(inOut
的每个级别一个)。