如何根据日期时间条件(第一条记录或最后一条记录)和随时间推移的事件计数对数据进行分组
How to group data based on conditional on datetimes (first record or last record) and count of events over time
我得到了两个数据框:
Data frame A:有服装店的购买信息,变量为:客户姓名,购买日期,代理商,时间t内购买的产品.
NAME
PRODUCT
AGENT
DATE_PURCHASE
Karen
M_14
X_1
8-25-20021 18:21:28
Jean
M_78
X_3
8-26-20021 18:11:06
Jean
M_71
X_4
8-26-20021 18:21:01
Jean
M_64
X_4
8-27-20021 20:21:59
Keith
M_57
X_4
8-27-20021 20:21:02
Alba
M_50
X_1
8-28-20021 20:21:03
Alba
M_43
X_3
8-29-20021 20:21:04
Alex
M_36
X_2
8-25-20021 20:21:05
数据框B:有在时间段t内拨打过公司CX SERVICE专线的客户信息,存储了变量name,date of t电话和电话类型。
NAME
TYPE
DATE_OF_CALL
DATE_PURCHASE
Karen
COMPLAIN
8-26-20021 18:21:28
8-25-20021 18:21:28
Jean
CX_SERVICE
8-27-20021 18:11:06
8-26-20021 18:11:06
Jean
COMPLAIN
8-28-20021 18:21:01
8-26-20021 18:21:01
Jean
CX_SERVICE
8-29-20021 20:21:59
8-27-20021 20:21:59
Keith
CX_SERVICE
8-29-20021 20:21:02
8-27-20021 20:21:02
Alba
COMPLAIN
8-30-20021 20:21:03
8-28-20021 20:21:03
Alex
CX_SERVICE
8-25-20021 21:21:05
8-29-20021 20:21:04
我必须构建一个 table,其中它将由 NAME 显示客户在 之前最后购买的产品是什么]最后一次呼叫客户服务热线,它应该包括变量:NAME,LAST_PRODUCT_PURCHASED,AGENT,DATE_PURCHASE,TYPE,DATE_OF_CALL
table 应该看起来像这样:
结果
NAME
LAST_PRODUCT_PURCHASED
AGENT
DATE_PURCHASE
TYPE
DATE_OF_CALL
Karen
M_14
X_1
8-25-20021 18:21:28
COMPLAIN
8-26-20021 18:21:28
Jean
M_64
X_4
8-27-20021 20:21:59
CX_SERVICE
8-29-20021 20:21:59
Keith
M_57
X_4
8-27-20021 20:21:02
CX_SERVICE
8-29-20021 20:21:02
Alba
M_43
X_3
8-29-20021 20:21:04
COMPLAIN
8-30-20021 20:21:03
Alex
M_36
X_2
8-25-20021 20:21:05
CX_SERVICE
8-25-20021 21:21:05
例如:第二个 raw 显示了预期的结果,因为 Jean 购买的最后一个产品是 M-78,她最后一次拨打电话是 TYPE= CX_SERVICE,日期为 8-29- 20021 20:21:59
我一直在考虑按 NAME 和 DATES 进行分组或者加入,但我看不到找到一种方法来处理“最后”产品和“最后”日期的情况来调用行
PD:如果我们尝试添加一个列来计算客户(NAME)在他们最近一次通话之前的通话次数,会怎样。
这个应该可以,但肯定还有改进的余地:
library(tidyverse)
df1 <- tibble::tribble(
~NAME, ~PRODUCT, ~AGENT, ~DATE_PURCHASE,
"Karen", "M_14", "X_1", "8-25-20021 18:21:28",
"Jean", "M_78", "X_3", "8-26-20021 18:11:06",
"Jean", "M_71", "X_4", "8-26-20021 18:21:01",
"Jean", "M_64", "X_4", "8-27-20021 20:21:59",
"Keith", "M_57", "X_4", "8-27-20021 20:21:02",
"Alba", "M_50", "X_1", "8-28-20021 20:21:03",
"Alba", "M_43", "X_3", "8-29-20021 20:21:04",
"Alex", "M_36", "X_2", "8-25-20021 20:21:05"
)
df2 <- tibble::tribble(
~NAME, ~TYPE, ~DATE_OF_CALL, ~DATE_PURCHASE,
"Karen", "COMPLAIN", "8-26-20021 18:21:28", "8-25-20021 18:21:28",
"Jean", "CX_SERVICE", "8-27-20021 18:11:06", "8-26-20021 18:11:06",
"Jean", "COMPLAIN", "8-28-20021 18:21:01", "8-26-20021 18:21:01",
"Jean", "CX_SERVICE", "8-29-20021 20:21:59", "8-27-20021 20:21:59",
"Keith", "CX_SERVICE", "8-29-20021 20:21:02", "8-27-20021 20:21:02",
"Alba", "COMPLAIN", "8-30-20021 20:21:03", "8-28-20021 20:21:03",
"Alex", "CX_SERVICE", "8-25-20021 21:21:05", "8-29-20021 20:21:04"
)
joined_df <- dplyr::full_join(df1, df2, by = "NAME")
solution <- joined_df %>%
group_by(NAME) %>%
select(-c(DATE_PURCHASE.y)) %>%
top_n(n = 1, wt = DATE_PURCHASE.x) %>%
top_n(n = 1, wt = DATE_OF_CALL) %>%
rename("LAST_PRODUCT_PURCHASED" = "PRODUCT",
"DATE_PURCHASE" = "DATE_PURCHASE.x")
solution
# A tibble: 5 x 6
# Groups: NAME [5]
# NAME LAST_PRODUCT_PURCHASED AGENT DATE_PURCHASE TYPE DATE_OF_CALL
# <chr> <chr> <chr> <chr> <chr> <chr>
#1 Karen M_14 X_1 8-25-20021 18:21:28 COMPLAIN 8-26-20021 18:21:28
#2 Jean M_64 X_4 8-27-20021 20:21:59 CX_SERVICE 8-29-20021 20:21:59
#3 Keith M_57 X_4 8-27-20021 20:21:02 CX_SERVICE 8-29-20021 20:21:02
#4 Alba M_43 X_3 8-29-20021 20:21:04 COMPLAIN 8-30-20021 20:21:03
#5 Alex M_36 X_2 8-25-20021 20:21:05 CX_SERVICE 8-25-20021 21:21:05
我得到了两个数据框:
Data frame A:有服装店的购买信息,变量为:客户姓名,购买日期,代理商,时间t内购买的产品.
NAME | PRODUCT | AGENT | DATE_PURCHASE |
---|---|---|---|
Karen | M_14 | X_1 | 8-25-20021 18:21:28 |
Jean | M_78 | X_3 | 8-26-20021 18:11:06 |
Jean | M_71 | X_4 | 8-26-20021 18:21:01 |
Jean | M_64 | X_4 | 8-27-20021 20:21:59 |
Keith | M_57 | X_4 | 8-27-20021 20:21:02 |
Alba | M_50 | X_1 | 8-28-20021 20:21:03 |
Alba | M_43 | X_3 | 8-29-20021 20:21:04 |
Alex | M_36 | X_2 | 8-25-20021 20:21:05 |
数据框B:有在时间段t内拨打过公司CX SERVICE专线的客户信息,存储了变量name,date of t电话和电话类型。
NAME | TYPE | DATE_OF_CALL | DATE_PURCHASE |
---|---|---|---|
Karen | COMPLAIN | 8-26-20021 18:21:28 | 8-25-20021 18:21:28 |
Jean | CX_SERVICE | 8-27-20021 18:11:06 | 8-26-20021 18:11:06 |
Jean | COMPLAIN | 8-28-20021 18:21:01 | 8-26-20021 18:21:01 |
Jean | CX_SERVICE | 8-29-20021 20:21:59 | 8-27-20021 20:21:59 |
Keith | CX_SERVICE | 8-29-20021 20:21:02 | 8-27-20021 20:21:02 |
Alba | COMPLAIN | 8-30-20021 20:21:03 | 8-28-20021 20:21:03 |
Alex | CX_SERVICE | 8-25-20021 21:21:05 | 8-29-20021 20:21:04 |
我必须构建一个 table,其中它将由 NAME 显示客户在 之前最后购买的产品是什么]最后一次呼叫客户服务热线,它应该包括变量:NAME,LAST_PRODUCT_PURCHASED,AGENT,DATE_PURCHASE,TYPE,DATE_OF_CALL table 应该看起来像这样:
结果
NAME | LAST_PRODUCT_PURCHASED | AGENT | DATE_PURCHASE | TYPE | DATE_OF_CALL |
---|---|---|---|---|---|
Karen | M_14 | X_1 | 8-25-20021 18:21:28 | COMPLAIN | 8-26-20021 18:21:28 |
Jean | M_64 | X_4 | 8-27-20021 20:21:59 | CX_SERVICE | 8-29-20021 20:21:59 |
Keith | M_57 | X_4 | 8-27-20021 20:21:02 | CX_SERVICE | 8-29-20021 20:21:02 |
Alba | M_43 | X_3 | 8-29-20021 20:21:04 | COMPLAIN | 8-30-20021 20:21:03 |
Alex | M_36 | X_2 | 8-25-20021 20:21:05 | CX_SERVICE | 8-25-20021 21:21:05 |
例如:第二个 raw 显示了预期的结果,因为 Jean 购买的最后一个产品是 M-78,她最后一次拨打电话是 TYPE= CX_SERVICE,日期为 8-29- 20021 20:21:59
我一直在考虑按 NAME 和 DATES 进行分组或者加入,但我看不到找到一种方法来处理“最后”产品和“最后”日期的情况来调用行
PD:如果我们尝试添加一个列来计算客户(NAME)在他们最近一次通话之前的通话次数,会怎样。
这个应该可以,但肯定还有改进的余地:
library(tidyverse)
df1 <- tibble::tribble(
~NAME, ~PRODUCT, ~AGENT, ~DATE_PURCHASE,
"Karen", "M_14", "X_1", "8-25-20021 18:21:28",
"Jean", "M_78", "X_3", "8-26-20021 18:11:06",
"Jean", "M_71", "X_4", "8-26-20021 18:21:01",
"Jean", "M_64", "X_4", "8-27-20021 20:21:59",
"Keith", "M_57", "X_4", "8-27-20021 20:21:02",
"Alba", "M_50", "X_1", "8-28-20021 20:21:03",
"Alba", "M_43", "X_3", "8-29-20021 20:21:04",
"Alex", "M_36", "X_2", "8-25-20021 20:21:05"
)
df2 <- tibble::tribble(
~NAME, ~TYPE, ~DATE_OF_CALL, ~DATE_PURCHASE,
"Karen", "COMPLAIN", "8-26-20021 18:21:28", "8-25-20021 18:21:28",
"Jean", "CX_SERVICE", "8-27-20021 18:11:06", "8-26-20021 18:11:06",
"Jean", "COMPLAIN", "8-28-20021 18:21:01", "8-26-20021 18:21:01",
"Jean", "CX_SERVICE", "8-29-20021 20:21:59", "8-27-20021 20:21:59",
"Keith", "CX_SERVICE", "8-29-20021 20:21:02", "8-27-20021 20:21:02",
"Alba", "COMPLAIN", "8-30-20021 20:21:03", "8-28-20021 20:21:03",
"Alex", "CX_SERVICE", "8-25-20021 21:21:05", "8-29-20021 20:21:04"
)
joined_df <- dplyr::full_join(df1, df2, by = "NAME")
solution <- joined_df %>%
group_by(NAME) %>%
select(-c(DATE_PURCHASE.y)) %>%
top_n(n = 1, wt = DATE_PURCHASE.x) %>%
top_n(n = 1, wt = DATE_OF_CALL) %>%
rename("LAST_PRODUCT_PURCHASED" = "PRODUCT",
"DATE_PURCHASE" = "DATE_PURCHASE.x")
solution
# A tibble: 5 x 6
# Groups: NAME [5]
# NAME LAST_PRODUCT_PURCHASED AGENT DATE_PURCHASE TYPE DATE_OF_CALL
# <chr> <chr> <chr> <chr> <chr> <chr>
#1 Karen M_14 X_1 8-25-20021 18:21:28 COMPLAIN 8-26-20021 18:21:28
#2 Jean M_64 X_4 8-27-20021 20:21:59 CX_SERVICE 8-29-20021 20:21:59
#3 Keith M_57 X_4 8-27-20021 20:21:02 CX_SERVICE 8-29-20021 20:21:02
#4 Alba M_43 X_3 8-29-20021 20:21:04 COMPLAIN 8-30-20021 20:21:03
#5 Alex M_36 X_2 8-25-20021 20:21:05 CX_SERVICE 8-25-20021 21:21:05