R - 应用函数 - 用 0 替换观察值取决于观察值的变量以确定从哪一列开始
R - Apply function - Replacing Observations with 0 Depending on an Observation's Variable to Determine which column to Start
我有以下数据框:
Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 40)
Forecast.2017.06.30 <- c(150, 300, 40)
Forecast.2017.09.30 <- c(100, 100, 60)
Forecast.2017.12.31 <- c(150, 500, 100)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA)
df <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)
Sequence <- c("2017.03.31","2017.06.30", "2017.09.30", "2017.12.31")
如果 "Disable" 变量为真,我想用 0 替换观察的所有预测,除非 "Start" 变量指示开始删除变量的日期。这样我得到以下 table:
Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 0)
Forecast.2017.06.30 <- c(150, 0, 0)
Forecast.2017.09.30 <- c(100, 0, 0)
Forecast.2017.12.31 <- c(150, 0, 0)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA)
df2 <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)
我正在使用以下公式来替换所有表示为 "TRUE" 的预测。但是,它没有考虑开始用 0 替换预测的日期。
df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2,
function(x) { replace(x,df$Disable == TRUE,0)})
为了考虑开始日期,我尝试用 ifelse(!is.na(df$Start),df$Start,min(sequence)) 替换 min(sequence) 部分,这样它看起来像下面这样:
df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2,
function(x) { replace(x,df$Disable == TRUE,0)})
但是我收到以下错误:
"argument 'pattern' has length > 1 and only the first element will be used"
不确定我应该如何更改代码以便在 Start "date" 存在时引用它。
感谢任何帮助。
这是一种方法。我们创建一个用 0 替换值的函数,即
Fun1 <- function(df, var, n) {
ind1 <- grep('Forecast.', names(df))
replace(df[n,], var[n]:max(ind1), 0)
}
#create a new column which indicates when to start replacing with 0 based on Start variable
df$new <- sapply(df$Start, function(i) match(i, sub('^Forecast.', '', names(df))))
#Handle the NA in column "new"
df$new[is.na(df$new) & df$Disable == TRUE] <- min(ind1)
#Identify rows to change the values
ind2 <- which(!is.na(df$new))
#Apply the function
df[ind2,] <- as.data.frame(t(sapply(ind2, function(i) unlist(Fun1(df, df$new, i)))), stringsAsFactors = FALSE)
#use ind1 to convert to integers,
df[ind1] <- lapply(df[ind1], as.integer)
#Farm YearlyVolume Forecast.2017.03.31 Forecast.2017.06.30 Forecast.2017.09.30 Forecast.2017.12.31 Disable Start new
#1 ABC 500 100 150 100 150 <NA> <NA> <NA>
#2 DEF 1000 200 0 0 0 TRUE 2017.06.30 4
#3 XYZ 200 0 0 0 0 TRUE <NA> 3
注意
我用 stringsAsFactors = FALSE
读取了你的数据框,即
df <- data.frame(Farm, YearlyVolume,
Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30,
Forecast.2017.12.31, Disable, Start, stringsAsFactors = FALSE)
我有以下数据框:
Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 40)
Forecast.2017.06.30 <- c(150, 300, 40)
Forecast.2017.09.30 <- c(100, 100, 60)
Forecast.2017.12.31 <- c(150, 500, 100)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA)
df <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)
Sequence <- c("2017.03.31","2017.06.30", "2017.09.30", "2017.12.31")
如果 "Disable" 变量为真,我想用 0 替换观察的所有预测,除非 "Start" 变量指示开始删除变量的日期。这样我得到以下 table:
Farm <- c("ABC","DEF","XYZ")
YearlyVolume <- c(500, 1000, 200)
Forecast.2017.03.31 <- c(100, 200, 0)
Forecast.2017.06.30 <- c(150, 0, 0)
Forecast.2017.09.30 <- c(100, 0, 0)
Forecast.2017.12.31 <- c(150, 0, 0)
Disable <- c(NA,TRUE,TRUE)
Start <- c(NA,"2017.06.30",NA)
df2 <- data.frame(Farm, YearlyVolume, Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30, Forecast.2017.12.31, Disable, Start)
我正在使用以下公式来替换所有表示为 "TRUE" 的预测。但是,它没有考虑开始用 0 替换预测的日期。
df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",min(Sequence)),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2,
function(x) { replace(x,df$Disable == TRUE,0)})
为了考虑开始日期,我尝试用 ifelse(!is.na(df$Start),df$Start,min(sequence)) 替换 min(sequence) 部分,这样它看起来像下面这样:
df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast.",max(Sequence)),colnames(df))] <- apply(df[,grep(paste0("Forecast.",ifelse(!is.na(df$Start),df$Start,min(sequence))),colnames(df)):grep(paste0("Forecast",max(Sequence)),colnames(df))], 2,
function(x) { replace(x,df$Disable == TRUE,0)})
但是我收到以下错误:
"argument 'pattern' has length > 1 and only the first element will be used"
不确定我应该如何更改代码以便在 Start "date" 存在时引用它。
感谢任何帮助。
这是一种方法。我们创建一个用 0 替换值的函数,即
Fun1 <- function(df, var, n) {
ind1 <- grep('Forecast.', names(df))
replace(df[n,], var[n]:max(ind1), 0)
}
#create a new column which indicates when to start replacing with 0 based on Start variable
df$new <- sapply(df$Start, function(i) match(i, sub('^Forecast.', '', names(df))))
#Handle the NA in column "new"
df$new[is.na(df$new) & df$Disable == TRUE] <- min(ind1)
#Identify rows to change the values
ind2 <- which(!is.na(df$new))
#Apply the function
df[ind2,] <- as.data.frame(t(sapply(ind2, function(i) unlist(Fun1(df, df$new, i)))), stringsAsFactors = FALSE)
#use ind1 to convert to integers,
df[ind1] <- lapply(df[ind1], as.integer)
#Farm YearlyVolume Forecast.2017.03.31 Forecast.2017.06.30 Forecast.2017.09.30 Forecast.2017.12.31 Disable Start new
#1 ABC 500 100 150 100 150 <NA> <NA> <NA>
#2 DEF 1000 200 0 0 0 TRUE 2017.06.30 4
#3 XYZ 200 0 0 0 0 TRUE <NA> 3
注意
我用 stringsAsFactors = FALSE
读取了你的数据框,即
df <- data.frame(Farm, YearlyVolume,
Forecast.2017.03.31, Forecast.2017.06.30, Forecast.2017.09.30,
Forecast.2017.12.31, Disable, Start, stringsAsFactors = FALSE)