如何根据日期时间条件(第一条记录或最后一条记录)和随时间推移的事件计数对数据进行分组

How to group data based on conditional on datetimes (first record or last record) and count of events over time

我得到了两个数据框:

Data frame A:有服装店的购买信息,变量为:客户姓名,购买日期,代理商,时间t内购买的产品.

NAME PRODUCT AGENT DATE_PURCHASE
Karen M_14 X_1 8-25-20021 18:21:28
Jean M_78 X_3 8-26-20021 18:11:06
Jean M_71 X_4 8-26-20021 18:21:01
Jean M_64 X_4 8-27-20021 20:21:59
Keith M_57 X_4 8-27-20021 20:21:02
Alba M_50 X_1 8-28-20021 20:21:03
Alba M_43 X_3 8-29-20021 20:21:04
Alex M_36 X_2 8-25-20021 20:21:05

数据框B:有在时间段t内拨打过公司CX SERVICE专线的客户信息,存储了变量name,date of t电话和电话类型。

NAME TYPE DATE_OF_CALL DATE_PURCHASE
Karen COMPLAIN 8-26-20021 18:21:28 8-25-20021 18:21:28
Jean CX_SERVICE 8-27-20021 18:11:06 8-26-20021 18:11:06
Jean COMPLAIN 8-28-20021 18:21:01 8-26-20021 18:21:01
Jean CX_SERVICE 8-29-20021 20:21:59 8-27-20021 20:21:59
Keith CX_SERVICE 8-29-20021 20:21:02 8-27-20021 20:21:02
Alba COMPLAIN 8-30-20021 20:21:03 8-28-20021 20:21:03
Alex CX_SERVICE 8-25-20021 21:21:05 8-29-20021 20:21:04

我必须构建一个 table,其中它将由 NAME 显示客户在 之前最后购买的产品是什么]最后一次呼叫客户服务热线,它应该包括变量:NAME,LAST_PRODUCT_PURCHASED,AGENT,DATE_PURCHASE,TYPE,DATE_OF_CALL table 应该看起来像这样:

结果

NAME LAST_PRODUCT_PURCHASED AGENT DATE_PURCHASE TYPE DATE_OF_CALL
Karen M_14 X_1 8-25-20021 18:21:28 COMPLAIN 8-26-20021 18:21:28
Jean M_64 X_4 8-27-20021 20:21:59 CX_SERVICE 8-29-20021 20:21:59
Keith M_57 X_4 8-27-20021 20:21:02 CX_SERVICE 8-29-20021 20:21:02
Alba M_43 X_3 8-29-20021 20:21:04 COMPLAIN 8-30-20021 20:21:03
Alex M_36 X_2 8-25-20021 20:21:05 CX_SERVICE 8-25-20021 21:21:05

例如:第二个 raw 显示了预期的结果,因为 Jean 购买的最后一个产品是 M-78,她最后一次拨打电话是 TYPE= CX_SERVICE,日期为 8-29- 20021 20:21:59

我一直在考虑按 NAME 和 DATES 进行分组或者加入,但我看不到找到一种方法来处理“最后”产品和“最后”日期的情况来调用行

PD:如果我们尝试添加一个列来计算客户(NAME)在他们最近一次通话之前的通话次数,会怎样。

这个应该可以,但肯定还有改进的余地:

library(tidyverse)

df1 <- tibble::tribble(
  ~NAME, ~PRODUCT, ~AGENT,        ~DATE_PURCHASE,
  "Karen",   "M_14",  "X_1", "8-25-20021 18:21:28",
  "Jean",   "M_78",  "X_3", "8-26-20021 18:11:06",
  "Jean",   "M_71",  "X_4", "8-26-20021 18:21:01",
  "Jean",   "M_64",  "X_4", "8-27-20021 20:21:59",
  "Keith",   "M_57",  "X_4", "8-27-20021 20:21:02",
  "Alba",   "M_50",  "X_1", "8-28-20021 20:21:03",
  "Alba",   "M_43",  "X_3", "8-29-20021 20:21:04",
  "Alex",   "M_36",  "X_2", "8-25-20021 20:21:05"
)

df2 <- tibble::tribble(
  ~NAME,        ~TYPE,         ~DATE_OF_CALL,        ~DATE_PURCHASE,
  "Karen",   "COMPLAIN", "8-26-20021 18:21:28", "8-25-20021 18:21:28",
  "Jean", "CX_SERVICE", "8-27-20021 18:11:06", "8-26-20021 18:11:06",
  "Jean",   "COMPLAIN", "8-28-20021 18:21:01", "8-26-20021 18:21:01",
  "Jean", "CX_SERVICE", "8-29-20021 20:21:59", "8-27-20021 20:21:59",
  "Keith", "CX_SERVICE", "8-29-20021 20:21:02", "8-27-20021 20:21:02",
  "Alba",   "COMPLAIN", "8-30-20021 20:21:03", "8-28-20021 20:21:03",
  "Alex", "CX_SERVICE", "8-25-20021 21:21:05", "8-29-20021 20:21:04"
)

joined_df <- dplyr::full_join(df1, df2, by = "NAME")

solution <- joined_df %>% 
  group_by(NAME) %>% 
  select(-c(DATE_PURCHASE.y)) %>%
  top_n(n = 1, wt = DATE_PURCHASE.x) %>% 
  top_n(n = 1, wt = DATE_OF_CALL) %>% 
  rename("LAST_PRODUCT_PURCHASED" = "PRODUCT",
         "DATE_PURCHASE" = "DATE_PURCHASE.x")

solution
# A tibble: 5 x 6
# Groups:   NAME [5]
#  NAME  LAST_PRODUCT_PURCHASED AGENT DATE_PURCHASE     TYPE       DATE_OF_CALL       
#  <chr> <chr>                  <chr> <chr>               <chr>      <chr>              
#1 Karen M_14                   X_1   8-25-20021 18:21:28 COMPLAIN   8-26-20021 18:21:28
#2 Jean  M_64                   X_4   8-27-20021 20:21:59 CX_SERVICE 8-29-20021 20:21:59
#3 Keith M_57                   X_4   8-27-20021 20:21:02 CX_SERVICE 8-29-20021 20:21:02
#4 Alba  M_43                   X_3   8-29-20021 20:21:04 COMPLAIN   8-30-20021 20:21:03
#5 Alex  M_36                   X_2   8-25-20021 20:21:05 CX_SERVICE 8-25-20021 21:21:05