当其他列更改时为每个组创建一个列
Create a column for each group when the other column changes
我有以下示例数据集,我想在其中创建 NUMBER_OF_RENEWALS 列。
基本上,当每个 ID 的 SUB_ID 值发生变化时,我想在 NUMBER_OF_RENEWALS 列中加 1
下面是一个具有所需结果的示例数据集(第 NUMBER_OF_RENEWALS 列)。
ID <- c("A", "A", "A" ,"A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "D", "D")
SUB_ID <- c("250", "252", "252", "252", "200", "201", "202", "220", "220", "220", "250", "250", "251", "252", "252")
NUMBER_OF_RENEWALS <- c(0,1,1,1,0,1,2,0,0,0,0,0,1,2,2)
sample_df <- data.frame(ID, SUB_ID,NUMBER_OF_RENEWALS)
ID SUB_ID NUMBER_OF_RENEWALS
1 A 250 0
2 A 252 1
3 A 252 1
4 A 252 1
5 B 200 0
6 B 201 1
7 B 202 2
8 C 220 0
9 C 220 0
10 C 220 0
11 D 250 0
12 D 250 0
13 D 251 1
14 D 252 2
15 D 252 2
按 ID
分组后,您可以使用 cumsum
并在 SUB_ID
发生变化时增加续订次数:
library(tidyverse)
sample_df %>%
group_by(ID) %>%
mutate(NUMBER_OF_RENEWALS = cumsum(SUB_ID != lag(SUB_ID, default = first(SUB_ID))))
或者使用 data.table
rleid
你可以这样做:
sample_df %>%
group_by(ID) %>%
mutate(NUMBER_OF_RENEWALS = data.table::rleid(SUB_ID) - 1)
输出
ID SUB_ID NUMBER_OF_RENEWALS
<chr> <chr> <int>
1 A 250 0
2 A 252 1
3 A 252 1
4 A 252 1
5 B 200 0
6 B 201 1
7 B 202 2
8 C 220 0
9 C 220 0
10 C 220 0
11 D 250 0
12 D 250 0
13 D 251 1
14 D 252 2
15 D 252 2
这是基础 R 尝试 -
transform(sample_df, NUMBER_OF_RENEWALS =
as.integer(ave(SUB_ID, ID, FUN = function(x) match(x, unique(x)))) - 1)
# ID SUB_ID NUMBER_OF_RENEWALS
#1 A 250 0
#2 A 252 1
#3 A 252 1
#4 A 252 1
#5 B 200 0
#6 B 201 1
#7 B 202 2
#8 C 220 0
#9 C 220 0
#10 C 220 0
#11 D 250 0
#12 D 250 0
#13 D 251 1
#14 D 252 2
#15 D 252 2
我有以下示例数据集,我想在其中创建 NUMBER_OF_RENEWALS 列。
基本上,当每个 ID 的 SUB_ID 值发生变化时,我想在 NUMBER_OF_RENEWALS 列中加 1
下面是一个具有所需结果的示例数据集(第 NUMBER_OF_RENEWALS 列)。
ID <- c("A", "A", "A" ,"A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "D", "D")
SUB_ID <- c("250", "252", "252", "252", "200", "201", "202", "220", "220", "220", "250", "250", "251", "252", "252")
NUMBER_OF_RENEWALS <- c(0,1,1,1,0,1,2,0,0,0,0,0,1,2,2)
sample_df <- data.frame(ID, SUB_ID,NUMBER_OF_RENEWALS)
ID SUB_ID NUMBER_OF_RENEWALS
1 A 250 0
2 A 252 1
3 A 252 1
4 A 252 1
5 B 200 0
6 B 201 1
7 B 202 2
8 C 220 0
9 C 220 0
10 C 220 0
11 D 250 0
12 D 250 0
13 D 251 1
14 D 252 2
15 D 252 2
按 ID
分组后,您可以使用 cumsum
并在 SUB_ID
发生变化时增加续订次数:
library(tidyverse)
sample_df %>%
group_by(ID) %>%
mutate(NUMBER_OF_RENEWALS = cumsum(SUB_ID != lag(SUB_ID, default = first(SUB_ID))))
或者使用 data.table
rleid
你可以这样做:
sample_df %>%
group_by(ID) %>%
mutate(NUMBER_OF_RENEWALS = data.table::rleid(SUB_ID) - 1)
输出
ID SUB_ID NUMBER_OF_RENEWALS
<chr> <chr> <int>
1 A 250 0
2 A 252 1
3 A 252 1
4 A 252 1
5 B 200 0
6 B 201 1
7 B 202 2
8 C 220 0
9 C 220 0
10 C 220 0
11 D 250 0
12 D 250 0
13 D 251 1
14 D 252 2
15 D 252 2
这是基础 R 尝试 -
transform(sample_df, NUMBER_OF_RENEWALS =
as.integer(ave(SUB_ID, ID, FUN = function(x) match(x, unique(x)))) - 1)
# ID SUB_ID NUMBER_OF_RENEWALS
#1 A 250 0
#2 A 252 1
#3 A 252 1
#4 A 252 1
#5 B 200 0
#6 B 201 1
#7 B 202 2
#8 C 220 0
#9 C 220 0
#10 C 220 0
#11 D 250 0
#12 D 250 0
#13 D 251 1
#14 D 252 2
#15 D 252 2