在循环内创建新列或应用

Question

我正在使用的数据集是按客户和月份分类的计费数据。最后，我想制作一个数据框，其中包含行的客户 ID 和列名的月份 - 就像在原始数据集中一样。但是，我希望这个新数据集包含虚拟变量，用于判断客户那个月是否 "gained"。他们之前从未被计费过，那个月是他们第一次被计费。

这是一个可重现的示例以及我现在编写的循环：

set.seed(24)
example.data <- data.frame(
   ID = sample(11:20),
   Jan = sample(0:5, 10, replace = TRUE),
   Feb = sample(0:5, 10, replace = TRUE),
   Mar = sample(0:5, 10, replace = TRUE),
   Apr = sample(0:5, 10, replace = TRUE)
)
gained.df.ex <- data.frame(example.data$ID)

## customers can't be gained in the first month
## there's no previous data to verify that this is the first time they've been billed, so all values are 0

gained.df.ex$Jan <- rep(0, length(example.data$ID)

## here's the loop that isn't working

for(i in 3:5){
   new.month.dummy <- for (x in 1:length(gained.df.ex$example.data.ID)){
      ifelse(example.data[x,i] == 0, new.month.dummy[x] <- 0, ifelse(sum(example.data[x,2:(i-1)]} == 0, new.month.dummy[x] <-1, new.month.dummy <- 0))
}

我确定有一种方法可以通过 apply 执行此操作，但我不确定如何操作。

预期的输出如下所示：

> example.data
   Jan Feb Mar Apr
15   0   3   4   3
19   1   3   0   5
20   4   2   5   1
12   2   1   3   0
14   0   0   2   1
17   5   5   4   4
11   3   4   1   5
18   1   0   0   2
13   3   2   5   3
16   2   5   1   2

> gained.df.ex
   Jan Feb Mar Apr
15   0   1   0   0
19   0   0   0   0
20   0   0   0   0
12   0   0   0   0
14   0   0   1   0
17   0   0   0   0
11   0   0   0   0
18   0   0   0   0
13   0   0   0   0
16   0   0   0   0

Answer 1

我们可以试试

gained.df.ex[names(example.data)] <- t(apply(example.data, 1, function(x) {
            i1 <- tail(which(cumsum(x)==0),1)
             x1 <- rep(0, length(x))
             if(length(i1) >0) replace(x1, i1+1, 1) else x1}))
gained.df.ex[names(example.data)]
#   Jan Feb Mar Apr
#1    0   1   0   0
#2    0   0   0   0
#3    0   0   0   0
#4    0   0   0   0
#5    0   0   1   0
#6    0   0   0   0
#7    0   0   0   0
#8    0   0   0   0
#9    0   0   0   0
#10   0   0   0   0

在循环内创建新列或应用

Create new columns within loop or apply

loops

for-loop

r

apply