使用数据表根据条件按组添加新行?

Add a new row by group based on a condition using datatable?

这是我当前的数据表:

ID  Stage Month 
200 A    2020-11   
200 B    2020-11  
200 C    2020-11   
201 A    2020-11   
201 B   2020-11  
... 

我试图仅在阶段 A B 和 C 存在时向每个 ID/month 组添加一行。

这将是我想要的输出:

ID  Stage Month 
200 A    2020-11   
200 B    2020-11  
200 C    2020-11 
200 All  2020-11  
201 A    2020-11   
201 B   2020-11 
...  

我是数据表和 R 的新手,因此非常感谢任何指导!

我们可以按 'ID'、'Month' 分组,检查 if all 'A'、'B'、'C'找到 %in% 'Stage',然后将 'Stage' 与 'All' (c(Stage, 'All)) 或 else return 'Stage' 连接起来]

library(data.table)
setDT(df1)[, .(Stage = if(all(c('A', 'B', 'C') %in% Stage)) c(Stage, 'All') 
              else Stage),  by = .(ID, Month)][, names(df1), with = FALSE]

-输出

#    ID Stage   Month
#1: 200     A 2020-11
#2: 200     B 2020-11
#3: 200     C 2020-11
#4: 200   All 2020-11
#5: 201     A 2020-11
#6: 201     B 2020-11

或在tidyverse

中使用类似的逻辑
library(dplyr)
df1 %>% 
    group_by(ID, Month) %>% 
    summarise(Stage = if(all(c('A', 'B', 'C') %in% Stage)) c(Stage, 'All') 
           else Stage, .groups = 'drop') %>% 
    select(names(df1))
# A tibble: 6 x 3
#     ID Stage Month  
#  <int> <chr> <chr>  
#1   200 A     2020-11
#2   200 B     2020-11
#3   200 C     2020-11
#4   200 All   2020-11
#5   201 A     2020-11
#6   201 B     2020-11

数据

df1 <- structure(list(ID = c(200L, 200L, 200L, 201L, 201L), Stage = c("A", 
"B", "C", "A", "B"), Month = c("2020-11", "2020-11", "2020-11", 
"2020-11", "2020-11")), class = "data.frame", row.names = c(NA, 
-5L))