R - 应用函数 - 用 0 替换观察值取决于观察值的变量以确定从哪一列开始

Question

我有以下数据框：

Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 40)
Forecast.2017.06.30 <- c(150, 300, 40)
Forecast.2017.09.30 <- c(100, 100, 60)
Forecast.2017.12.31 <- c(150, 500, 100)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA)

df <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)

Sequence <- c("2017.03.31","2017.06.30", "2017.09.30", "2017.12.31")

如果 "Disable" 变量为真，我想用 0 替换观察的所有预测，除非 "Start" 变量指示开始删除变量的日期。这样我得到以下 table:

Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 0)
Forecast.2017.06.30 <- c(150, 0, 0)
Forecast.2017.09.30 <- c(100, 0, 0)
Forecast.2017.12.31 <- c(150, 0, 0)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA) 

df2 <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)

我正在使用以下公式来替换所有表示为 "TRUE" 的预测。但是，它没有考虑开始用 0 替换预测的日期。

df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2, 
    function(x) { replace(x,df$Disable == TRUE,0)})

为了考虑开始日期，我尝试用 ifelse(!is.na(df$Start),df$Start,min(sequence)) 替换 min(sequence) 部分，这样它看起来像下面这样：

df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2, 
    function(x) { replace(x,df$Disable == TRUE,0)})

但是我收到以下错误：

"argument 'pattern' has length > 1 and only the first element will be used"

不确定我应该如何更改代码以便在 Start "date" 存在时引用它。

感谢任何帮助。

Answer 1

这是一种方法。我们创建一个用 0 替换值的函数，即

Fun1 <- function(df, var, n) {
  ind1 <- grep('Forecast.', names(df))
  replace(df[n,], var[n]:max(ind1), 0)
  }


#create a new column which indicates when to start replacing with 0 based on Start variable
df$new <- sapply(df$Start, function(i) match(i, sub('^Forecast.', '', names(df))))

#Handle the NA in column "new"
df$new[is.na(df$new) & df$Disable == TRUE] <- min(ind1)

#Identify rows to change the values
ind2 <- which(!is.na(df$new))

#Apply the function
df[ind2,] <- as.data.frame(t(sapply(ind2, function(i) unlist(Fun1(df, df$new, i)))), stringsAsFactors = FALSE)

#use ind1 to convert to integers,
df[ind1] <- lapply(df[ind1], as.integer)


#Farm YearlyVolume Forecast.2017.03.31 Forecast.2017.06.30 Forecast.2017.09.30 Forecast.2017.12.31 Disable      Start  new
#1  ABC          500                 100                 150                 100                 150    <NA>       <NA> <NA>
#2  DEF         1000                 200                   0                   0                   0    TRUE 2017.06.30    4
#3  XYZ          200                   0                   0                   0                   0    TRUE       <NA>    3

注意

我用 stringsAsFactors = FALSE 读取了你的数据框，即

df <- data.frame(Farm, YearlyVolume, 
                  Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, 
                  Forecast.2017.12.31, Disable, Start, stringsAsFactors = FALSE)

R - 应用函数 - 用 0 替换观察值取决于观察值的变量以确定从哪一列开始

R - Apply function - Replacing Observations with 0 Depending on an Observation's Variable to Determine which column to Start

r

apply