根据列的值计算中位数或平均值
Calculate median or mean depending on the value of a column
我正在尝试根据列的值计算中位数或均值。
想象一下下面的DF
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
我想用每行 3 个样本的中值或平均值填充 "median_mean" 列,具体取决于频率列。如果 Freq 大于或等于 10,则使用中位数,否则使用平均值。
请记住样本不会总是 3,所以我不能使用列 (2:4)。但他们的名字将永远是 sample_X.
谁能帮帮我?
这有效,使用 grep 获取列数
for(i in 1:nrow(DF)){
cols <- grep("sample", names(DF))
if(DF[i,]$Frequence > 10){
DF$median_mean[i] <- mean(as.integer(DF[i,cols]))
}else{
DF$median_mean[i] <- median(as.integer(DF[i,cols]))
}
}
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
DF[DF$Frequence>10,]$median_mean<-apply(DF[DF$Frequence>10,grep("sample_",names(DF))],1,median)
DF[DF$Frequence<10,]$median_mean<-rowMeans(DF[DF$Frequence<10,grep("sample_",names(DF))])
遍历行,根据Frequence:
列得到匹配函数(match.fun)
# sample_ column index
ix <- grepl("sample_", colnames(DF), fixed = TRUE)
DF$median_mean <- apply(DF, 1, function(i){
myFun <- match.fun(ifelse(i[6] >= 10, "median", "mean"))
myFun(as.numeric(i[ix]))
})
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
DF$median_mean = ifelse(DF$Frequence>=10, apply(DF[grep("sample_", names(DF))], 1L, median), apply(DF[grep("sample_", names(DF))], 1L, mean))
说明
我们将 median
和 mean
应用于相关列,使用:
apply(DF[grep("sample_", names(DF))], 1L, median)
和
apply(DF[grep("sample_", names(DF))], 1L, mean)
但我们 return 仅使用三元运算符的向量化形式 ifelse
.
得到我们想要的值
该代码也适用于任意数量的名为 sample_X
的列,因为我们概括了列的选择,只需使用 grep("sample_", names(DF))
.
搜索它们的名称
我正在尝试根据列的值计算中位数或均值。
想象一下下面的DF
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
我想用每行 3 个样本的中值或平均值填充 "median_mean" 列,具体取决于频率列。如果 Freq 大于或等于 10,则使用中位数,否则使用平均值。
请记住样本不会总是 3,所以我不能使用列 (2:4)。但他们的名字将永远是 sample_X.
谁能帮帮我?
这有效,使用 grep 获取列数
for(i in 1:nrow(DF)){
cols <- grep("sample", names(DF))
if(DF[i,]$Frequence > 10){
DF$median_mean[i] <- mean(as.integer(DF[i,cols]))
}else{
DF$median_mean[i] <- median(as.integer(DF[i,cols]))
}
}
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
DF[DF$Frequence>10,]$median_mean<-apply(DF[DF$Frequence>10,grep("sample_",names(DF))],1,median)
DF[DF$Frequence<10,]$median_mean<-rowMeans(DF[DF$Frequence<10,grep("sample_",names(DF))])
遍历行,根据Frequence:
列得到匹配函数(match.fun)# sample_ column index
ix <- grepl("sample_", colnames(DF), fixed = TRUE)
DF$median_mean <- apply(DF, 1, function(i){
myFun <- match.fun(ifelse(i[6] >= 10, "median", "mean"))
myFun(as.numeric(i[ix]))
})
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
DF$median_mean = ifelse(DF$Frequence>=10, apply(DF[grep("sample_", names(DF))], 1L, median), apply(DF[grep("sample_", names(DF))], 1L, mean))
说明
我们将 median
和 mean
应用于相关列,使用:
apply(DF[grep("sample_", names(DF))], 1L, median)
和
apply(DF[grep("sample_", names(DF))], 1L, mean)
但我们 return 仅使用三元运算符的向量化形式 ifelse
.
该代码也适用于任意数量的名为 sample_X
的列,因为我们概括了列的选择,只需使用 grep("sample_", names(DF))
.