在具有条件重复的日期之间按组合并

Question

我正在寻求有关使用 data.table 的特定合并问题的帮助。

示例数据如下：

library(data.table)


# Create example dataset
DT_A = data.table(
  Store = "A",
  Date = as.Date(sprintf("10-%02d-%02d", c(22:25, 26:28), rep(1:2, 4:3)),
    '%m-%d-%y')
)

DT_B = data.table(
  Store = "B",
  Date = as.Date(sprintf("10-%02d-%02d", c(22:25, 26:28), rep(1:2, 4:3)),
                 '%m-%d-%y')
)

DT <- rbindlist(list(DT_A, DT_B))


DT
    Store       Date
 1:     A 2001-10-22
 2:     A 2001-10-23
 3:     A 2001-10-24
 4:     A 2001-10-25
 5:     A 2002-10-26
 6:     A 2002-10-27
 7:     A 2002-10-28
 8:     B 2001-10-22
 9:     B 2001-10-23
10:     B 2001-10-24
11:     B 2001-10-25
12:     B 2002-10-26
13:     B 2002-10-27
14:     B 2002-10-28

因此，DT 对商店 A 和 B 进行了多个日期的观察。

我有另一个数据集，manager_DT 说，它有经理的开始和结束日期：


manager_DT <- data.table(Manager = c("John", "David", "Steve"),
                         Store = c("A", "A","B"),
                         min_date = c(as.Date("2001-10-22"),
                                      as.Date("2001-10-26"),
                                      as.Date("2001-10-22")),
                         max_date = c(as.Date("2001-10-27"),
                                      as.Date("2001-10-28"),
                                      as.Date("2002-10-28")))

 manager_DT

   Manager Store   min_date   max_date
1:    John     A 2001-10-22 2001-10-27
2:   David     A 2001-10-26 2001-10-28
3:   Steve     B 2001-10-22 2002-10-28

在给定时间店内可能有不止一位经理。此处，John 和 David 在商店 A 的任期重叠（特别是在 2001-10-26 和 2001-10-27），但史蒂夫是商店 B 的唯一经理。

使用 data.table 方法，我想将 manager_DT 合并到 DT 上，以便所需的输出是：

DT
Store       Date       Manager
 1:     A 2001-10-22    John
 2:     A 2001-10-23    John
 3:     A 2001-10-24    John
 4:     A 2001-10-25    John
 5:     A 2002-10-26    John 
 6:     A 2002-10-26    David
 7:     A 2002-10-27    John
 8:     A 2002-10-27    David
 9:     A 2002-10-28    David
 10:     B 2001-10-22    Steve  
 11:     B 2001-10-23    Steve
12:     B 2001-10-24    Steve
13:     B 2001-10-25    Steve
14:     B 2002-10-26    Steve
15:     B 2002-10-27    Steve
16:     B 2002-10-28    Steve

注意这里只有一个经理列，只要有重叠的日期，就会重复该行（这里重复了两个日期：2001-10-26 和 2001-10-27，其中 John 和 David 都是A 店经理）。

这里的想法是我想要在日期 x 商店 x 经理级别的独特观察。

谢谢！

Answer 1

可能的解决方案：

library(dplyr)

DT <- structure(list(Store = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "B", "B", "B", "B", "B", "B", "B"), Date = c("2001-10-22", 
"2001-10-23", "2001-10-24", "2001-10-25", "2002-10-26", "2002-10-26", 
"2002-10-27", "2002-10-27", "2002-10-28", "2001-10-22", "2001-10-23", 
"2001-10-24", "2001-10-25", "2002-10-26", "2002-10-27", "2002-10-28"
), Manager = c("John", "John", "John", "John", "John", "David", 
"John", "David", "David", "Steve", "Steve", "Steve", "Steve", 
"Steve", "Steve", "Steve")), row.names = c(NA, -16L), class = "data.frame")

DT %>%
  group_by(Store,Date) %>% 
  mutate(Manager = paste(Manager, collapse = " and "))

Answer 2

如果您有兴趣创建合并数据框，解决方案可以是：

library(tidyverse)
library(lubridate)
library(data.table)

manager_DT <- structure(list(Manager = c("John", "David", "Steve"), Store = c("A", 
"A", "B"), min_date = c("2001-10-22", "2001-10-26", "2001-10-22"
), max_date = c("2001-10-27", "2001-10-28", "2002-10-28")), row.names = c(NA, 
-3L), class = "data.frame")

n <- nrow(manager_DT)

map_dfr(1:n, ~ data.table(
  Store=manager_DT$Store[.x],
  Date=seq(ymd(manager_DT$min_date[.x]),ymd(manager_DT$max_date[.x]),by="days"),
  Manager = manager_DT$Manager[.x]
  ))

在具有条件重复的日期之间按组合并

Merging by group between dates with conditional repetition

r

dplyr

data.table