在循环内创建新列或应用
Create new columns within loop or apply
我正在使用的数据集是按客户和月份分类的计费数据。最后,我想制作一个数据框,其中包含行的客户 ID 和列名的月份 - 就像在原始数据集中一样。但是,我希望这个新数据集包含虚拟变量,用于判断客户那个月是否 "gained"。他们之前从未被计费过,那个月是他们第一次被计费。
这是一个可重现的示例以及我现在编写的循环:
set.seed(24)
example.data <- data.frame(
ID = sample(11:20),
Jan = sample(0:5, 10, replace = TRUE),
Feb = sample(0:5, 10, replace = TRUE),
Mar = sample(0:5, 10, replace = TRUE),
Apr = sample(0:5, 10, replace = TRUE)
)
gained.df.ex <- data.frame(example.data$ID)
## customers can't be gained in the first month
## there's no previous data to verify that this is the first time they've been billed, so all values are 0
gained.df.ex$Jan <- rep(0, length(example.data$ID)
## here's the loop that isn't working
for(i in 3:5){
new.month.dummy <- for (x in 1:length(gained.df.ex$example.data.ID)){
ifelse(example.data[x,i] == 0, new.month.dummy[x] <- 0, ifelse(sum(example.data[x,2:(i-1)]} == 0, new.month.dummy[x] <-1, new.month.dummy <- 0))
}
我确定有一种方法可以通过 apply 执行此操作,但我不确定如何操作。
预期的输出如下所示:
> example.data
Jan Feb Mar Apr
15 0 3 4 3
19 1 3 0 5
20 4 2 5 1
12 2 1 3 0
14 0 0 2 1
17 5 5 4 4
11 3 4 1 5
18 1 0 0 2
13 3 2 5 3
16 2 5 1 2
> gained.df.ex
Jan Feb Mar Apr
15 0 1 0 0
19 0 0 0 0
20 0 0 0 0
12 0 0 0 0
14 0 0 1 0
17 0 0 0 0
11 0 0 0 0
18 0 0 0 0
13 0 0 0 0
16 0 0 0 0
我们可以试试
gained.df.ex[names(example.data)] <- t(apply(example.data, 1, function(x) {
i1 <- tail(which(cumsum(x)==0),1)
x1 <- rep(0, length(x))
if(length(i1) >0) replace(x1, i1+1, 1) else x1}))
gained.df.ex[names(example.data)]
# Jan Feb Mar Apr
#1 0 1 0 0
#2 0 0 0 0
#3 0 0 0 0
#4 0 0 0 0
#5 0 0 1 0
#6 0 0 0 0
#7 0 0 0 0
#8 0 0 0 0
#9 0 0 0 0
#10 0 0 0 0
我正在使用的数据集是按客户和月份分类的计费数据。最后,我想制作一个数据框,其中包含行的客户 ID 和列名的月份 - 就像在原始数据集中一样。但是,我希望这个新数据集包含虚拟变量,用于判断客户那个月是否 "gained"。他们之前从未被计费过,那个月是他们第一次被计费。
这是一个可重现的示例以及我现在编写的循环:
set.seed(24)
example.data <- data.frame(
ID = sample(11:20),
Jan = sample(0:5, 10, replace = TRUE),
Feb = sample(0:5, 10, replace = TRUE),
Mar = sample(0:5, 10, replace = TRUE),
Apr = sample(0:5, 10, replace = TRUE)
)
gained.df.ex <- data.frame(example.data$ID)
## customers can't be gained in the first month
## there's no previous data to verify that this is the first time they've been billed, so all values are 0
gained.df.ex$Jan <- rep(0, length(example.data$ID)
## here's the loop that isn't working
for(i in 3:5){
new.month.dummy <- for (x in 1:length(gained.df.ex$example.data.ID)){
ifelse(example.data[x,i] == 0, new.month.dummy[x] <- 0, ifelse(sum(example.data[x,2:(i-1)]} == 0, new.month.dummy[x] <-1, new.month.dummy <- 0))
}
我确定有一种方法可以通过 apply 执行此操作,但我不确定如何操作。
预期的输出如下所示:
> example.data
Jan Feb Mar Apr
15 0 3 4 3
19 1 3 0 5
20 4 2 5 1
12 2 1 3 0
14 0 0 2 1
17 5 5 4 4
11 3 4 1 5
18 1 0 0 2
13 3 2 5 3
16 2 5 1 2
> gained.df.ex
Jan Feb Mar Apr
15 0 1 0 0
19 0 0 0 0
20 0 0 0 0
12 0 0 0 0
14 0 0 1 0
17 0 0 0 0
11 0 0 0 0
18 0 0 0 0
13 0 0 0 0
16 0 0 0 0
我们可以试试
gained.df.ex[names(example.data)] <- t(apply(example.data, 1, function(x) {
i1 <- tail(which(cumsum(x)==0),1)
x1 <- rep(0, length(x))
if(length(i1) >0) replace(x1, i1+1, 1) else x1}))
gained.df.ex[names(example.data)]
# Jan Feb Mar Apr
#1 0 1 0 0
#2 0 0 0 0
#3 0 0 0 0
#4 0 0 0 0
#5 0 0 1 0
#6 0 0 0 0
#7 0 0 0 0
#8 0 0 0 0
#9 0 0 0 0
#10 0 0 0 0