使用数据表根据条件按组添加新行?
Add a new row by group based on a condition using datatable?
这是我当前的数据表:
ID Stage Month
200 A 2020-11
200 B 2020-11
200 C 2020-11
201 A 2020-11
201 B 2020-11
...
我试图仅在阶段 A B 和 C 存在时向每个 ID/month 组添加一行。
这将是我想要的输出:
ID Stage Month
200 A 2020-11
200 B 2020-11
200 C 2020-11
200 All 2020-11
201 A 2020-11
201 B 2020-11
...
我是数据表和 R 的新手,因此非常感谢任何指导!
我们可以按 'ID'、'Month' 分组,检查 if
all
'A'、'B'、'C'找到 %in%
'Stage',然后将 'Stage' 与 'All' (c(Stage, 'All)
) 或 else
return 'Stage' 连接起来]
library(data.table)
setDT(df1)[, .(Stage = if(all(c('A', 'B', 'C') %in% Stage)) c(Stage, 'All')
else Stage), by = .(ID, Month)][, names(df1), with = FALSE]
-输出
# ID Stage Month
#1: 200 A 2020-11
#2: 200 B 2020-11
#3: 200 C 2020-11
#4: 200 All 2020-11
#5: 201 A 2020-11
#6: 201 B 2020-11
或在tidyverse
中使用类似的逻辑
library(dplyr)
df1 %>%
group_by(ID, Month) %>%
summarise(Stage = if(all(c('A', 'B', 'C') %in% Stage)) c(Stage, 'All')
else Stage, .groups = 'drop') %>%
select(names(df1))
# A tibble: 6 x 3
# ID Stage Month
# <int> <chr> <chr>
#1 200 A 2020-11
#2 200 B 2020-11
#3 200 C 2020-11
#4 200 All 2020-11
#5 201 A 2020-11
#6 201 B 2020-11
数据
df1 <- structure(list(ID = c(200L, 200L, 200L, 201L, 201L), Stage = c("A",
"B", "C", "A", "B"), Month = c("2020-11", "2020-11", "2020-11",
"2020-11", "2020-11")), class = "data.frame", row.names = c(NA,
-5L))
这是我当前的数据表:
ID Stage Month
200 A 2020-11
200 B 2020-11
200 C 2020-11
201 A 2020-11
201 B 2020-11
...
我试图仅在阶段 A B 和 C 存在时向每个 ID/month 组添加一行。
这将是我想要的输出:
ID Stage Month
200 A 2020-11
200 B 2020-11
200 C 2020-11
200 All 2020-11
201 A 2020-11
201 B 2020-11
...
我是数据表和 R 的新手,因此非常感谢任何指导!
我们可以按 'ID'、'Month' 分组,检查 if
all
'A'、'B'、'C'找到 %in%
'Stage',然后将 'Stage' 与 'All' (c(Stage, 'All)
) 或 else
return 'Stage' 连接起来]
library(data.table)
setDT(df1)[, .(Stage = if(all(c('A', 'B', 'C') %in% Stage)) c(Stage, 'All')
else Stage), by = .(ID, Month)][, names(df1), with = FALSE]
-输出
# ID Stage Month
#1: 200 A 2020-11
#2: 200 B 2020-11
#3: 200 C 2020-11
#4: 200 All 2020-11
#5: 201 A 2020-11
#6: 201 B 2020-11
或在tidyverse
library(dplyr)
df1 %>%
group_by(ID, Month) %>%
summarise(Stage = if(all(c('A', 'B', 'C') %in% Stage)) c(Stage, 'All')
else Stage, .groups = 'drop') %>%
select(names(df1))
# A tibble: 6 x 3
# ID Stage Month
# <int> <chr> <chr>
#1 200 A 2020-11
#2 200 B 2020-11
#3 200 C 2020-11
#4 200 All 2020-11
#5 201 A 2020-11
#6 201 B 2020-11
数据
df1 <- structure(list(ID = c(200L, 200L, 200L, 201L, 201L), Stage = c("A",
"B", "C", "A", "B"), Month = c("2020-11", "2020-11", "2020-11",
"2020-11", "2020-11")), class = "data.frame", row.names = c(NA,
-5L))