如何按时间跟团
How to follow group by time
我的数据框如下:
user time
____ ____
1 2017-09-01 00:01:01
1 2017-09-01 00:01:20
1 2017-09-01 00:03:01
1 2017-09-01 00:10:01
1 2017-09-01 00:11:01
2 2017-09-01 00:01:03
2 2017-09-01 00:01:08
2 2017-09-01 00:03:01
我想从这个数据框中为每个用户创建关注组,如下所示:
user time follow_group
____ ____________________ _____________
1 2017-09-01 00:01:01 1
1 2017-09-01 00:01:20 1
1 2017-09-01 00:03:01 1
1 2017-09-01 00:10:01 2
1 2017-09-01 00:11:01 2
2 2017-09-01 00:01:03 1
2 2017-09-01 00:01:08 1
2 2017-09-01 00:03:01 1
当时间差大于5分钟时,每个用户的关注组发生变化。
我尝试通过计算滞后并减去:
data[, previous_request_time:=c(NA, time[-.N]), by=user]
但这似乎没有用。感谢任何帮助。
只做一个difftime
操作,检查差异是否大于5分钟。然后一个累计数会给你的组计数器:
dat[,
follow_group := cumsum(difftime(time, shift(time, fill=-Inf), units="mins") > 5),
by=user
]
# user time follow_group
#1: 1 2017-09-01 00:01:01 1
#2: 1 2017-09-01 00:01:20 1
#3: 1 2017-09-01 00:03:01 1
#4: 1 2017-09-01 00:10:01 2
#5: 1 2017-09-01 00:11:01 2
#6: 2 2017-09-01 00:01:03 1
#7: 2 2017-09-01 00:01:08 1
#8: 2 2017-09-01 00:03:01 1
如果您不想对单位说得太明确,您也可以只使用 diff
:
dat[, flwgrp := cumsum(c(Inf, diff(time)) > 5*60), by=user]
我的数据框如下:
user time
____ ____
1 2017-09-01 00:01:01
1 2017-09-01 00:01:20
1 2017-09-01 00:03:01
1 2017-09-01 00:10:01
1 2017-09-01 00:11:01
2 2017-09-01 00:01:03
2 2017-09-01 00:01:08
2 2017-09-01 00:03:01
我想从这个数据框中为每个用户创建关注组,如下所示:
user time follow_group
____ ____________________ _____________
1 2017-09-01 00:01:01 1
1 2017-09-01 00:01:20 1
1 2017-09-01 00:03:01 1
1 2017-09-01 00:10:01 2
1 2017-09-01 00:11:01 2
2 2017-09-01 00:01:03 1
2 2017-09-01 00:01:08 1
2 2017-09-01 00:03:01 1
当时间差大于5分钟时,每个用户的关注组发生变化。
我尝试通过计算滞后并减去:
data[, previous_request_time:=c(NA, time[-.N]), by=user]
但这似乎没有用。感谢任何帮助。
只做一个difftime
操作,检查差异是否大于5分钟。然后一个累计数会给你的组计数器:
dat[,
follow_group := cumsum(difftime(time, shift(time, fill=-Inf), units="mins") > 5),
by=user
]
# user time follow_group
#1: 1 2017-09-01 00:01:01 1
#2: 1 2017-09-01 00:01:20 1
#3: 1 2017-09-01 00:03:01 1
#4: 1 2017-09-01 00:10:01 2
#5: 1 2017-09-01 00:11:01 2
#6: 2 2017-09-01 00:01:03 1
#7: 2 2017-09-01 00:01:08 1
#8: 2 2017-09-01 00:03:01 1
如果您不想对单位说得太明确,您也可以只使用 diff
:
dat[, flwgrp := cumsum(c(Inf, diff(time)) > 5*60), by=user]