使用 dplyr 计算对话中消息之间的时间差
Calculating Time Difference Between Messages in a Conversation using dplyr
我有一些数据包含对话中的消息。我需要计算某人回复消息的响应时间。我有两个参与者的唯一用户 ID,但是,当我使用下面的代码时,它只计算对话中每条消息的差异。我需要一种方法来计算响应和初始消息之间的总差异。 (即,如果有人发送多条初始消息而没有响应,我需要第一条消息和第一条响应之间的时间。)
convonlinetest <- convonline %>%
arrange(conversation_id, created_at) %>%
group_by(conversation_id) %>%
filter(n() > 1) %>%
mutate(timediff = created_at - lag(created_at))
关于堆栈的第一个问题,非常感谢您提前提供帮助!
编辑:一些示例数据
structure(list(conversation_id = c(20000004844375, 20000004844378,
20000004913095, 20000004837800, 20000004808210, 20000004808210,
20000004837799, 20000004844377, 20000004808210, 20000004846076
), user_id = c(-33135869739921264, -33135869739921264,
57394627930234816,
-33135869739921264, -33135869739921264, -70893327136775872,
-33135869739921264,
-33135869739921264, -33135869739921264, -33135869739921264),
created_at = c("2016-05-31 16:46:27.614", "2016-05-31 16:46:28.387",
"2016-07-11 20:20:06.589", "2016-05-27 16:31:05.716", "2016-05-13
12:48:25.125",
"2016-05-10 18:58:30.396", "2016-05-27 16:31:05.451", "2016-05-31
16:46:27.981",
"2016-05-19 18:43:02.859", "2016-06-01 13:16:26.753"), course_name =
c("acct-2020-30i",
"acct-2020-30i", "acct-2020-30i", "acct-2020-30i", "acct-2020-30i",
"acct-2020-30i", "acct-2020-30i", "acct-2020-30i", "acct-2020-30i",
"acct-2020-30i")), row.names = c(NA, 10L), class = "data.frame")
编辑: 找到解决方案
我为自己不记得聚合函数而自责,但结果很好。以为我以后会分享给任何人。
new <- aggregate(convonline, by=list(convonline$conversation_id,
convonline$user_id, FUN=min)
final <- new %>%
mutate(created_at = as.Date(created_at)) %>%
arrange(conversation_id, created_at) %>%
group_by(conversation_id) %>%
mutate(diff = created_at - lag(created_at))
当我 运行 你的代码中有一行将 created_at
列从字符列更改为日期时间列时,我得到了我认为是预期结果的结果。
library(lubridate) # great package for handling dates
data %>%
mutate(created_at = as_datetime(created_at)) %>% # NEW ROW OF CODE
arrange(conversation_id, created_at) %>%
group_by(conversation_id) %>%
filter(n() > 1) %>%
mutate(timediff = created_at - lag(created_at))
# A tibble: 3 x 5
# Groups: conversation_id [1]
conversation_id user_id created_at course_name timediff
<dbl> <dbl> <dttm> <chr> <time>
1 20000004808210 -7.09e16 2016-05-10 18:58:30 acct-2020-30i " NA days"
2 20000004808210 -3.31e16 2016-05-13 12:48:25 acct-2020-30i 2.742995 days
3 20000004808210 -3.31e16 2016-05-19 18:43:02 acct-2020-30i 6.246270 days
我有一些数据包含对话中的消息。我需要计算某人回复消息的响应时间。我有两个参与者的唯一用户 ID,但是,当我使用下面的代码时,它只计算对话中每条消息的差异。我需要一种方法来计算响应和初始消息之间的总差异。 (即,如果有人发送多条初始消息而没有响应,我需要第一条消息和第一条响应之间的时间。)
convonlinetest <- convonline %>%
arrange(conversation_id, created_at) %>%
group_by(conversation_id) %>%
filter(n() > 1) %>%
mutate(timediff = created_at - lag(created_at))
关于堆栈的第一个问题,非常感谢您提前提供帮助!
编辑:一些示例数据
structure(list(conversation_id = c(20000004844375, 20000004844378,
20000004913095, 20000004837800, 20000004808210, 20000004808210,
20000004837799, 20000004844377, 20000004808210, 20000004846076
), user_id = c(-33135869739921264, -33135869739921264,
57394627930234816,
-33135869739921264, -33135869739921264, -70893327136775872,
-33135869739921264,
-33135869739921264, -33135869739921264, -33135869739921264),
created_at = c("2016-05-31 16:46:27.614", "2016-05-31 16:46:28.387",
"2016-07-11 20:20:06.589", "2016-05-27 16:31:05.716", "2016-05-13
12:48:25.125",
"2016-05-10 18:58:30.396", "2016-05-27 16:31:05.451", "2016-05-31
16:46:27.981",
"2016-05-19 18:43:02.859", "2016-06-01 13:16:26.753"), course_name =
c("acct-2020-30i",
"acct-2020-30i", "acct-2020-30i", "acct-2020-30i", "acct-2020-30i",
"acct-2020-30i", "acct-2020-30i", "acct-2020-30i", "acct-2020-30i",
"acct-2020-30i")), row.names = c(NA, 10L), class = "data.frame")
编辑: 找到解决方案
我为自己不记得聚合函数而自责,但结果很好。以为我以后会分享给任何人。
new <- aggregate(convonline, by=list(convonline$conversation_id,
convonline$user_id, FUN=min)
final <- new %>%
mutate(created_at = as.Date(created_at)) %>%
arrange(conversation_id, created_at) %>%
group_by(conversation_id) %>%
mutate(diff = created_at - lag(created_at))
当我 运行 你的代码中有一行将 created_at
列从字符列更改为日期时间列时,我得到了我认为是预期结果的结果。
library(lubridate) # great package for handling dates
data %>%
mutate(created_at = as_datetime(created_at)) %>% # NEW ROW OF CODE
arrange(conversation_id, created_at) %>%
group_by(conversation_id) %>%
filter(n() > 1) %>%
mutate(timediff = created_at - lag(created_at))
# A tibble: 3 x 5
# Groups: conversation_id [1]
conversation_id user_id created_at course_name timediff
<dbl> <dbl> <dttm> <chr> <time>
1 20000004808210 -7.09e16 2016-05-10 18:58:30 acct-2020-30i " NA days"
2 20000004808210 -3.31e16 2016-05-13 12:48:25 acct-2020-30i 2.742995 days
3 20000004808210 -3.31e16 2016-05-19 18:43:02 acct-2020-30i 6.246270 days