根据列的值计算中位数或平均值

Calculate median or mean depending on the value of a column

我正在尝试根据列的值计算中位数或均值。

想象一下下面的DF

DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")

我想用每行 3 个样本的中值或平均值填充 "median_mean" 列,具体取决于频率列。如果 Freq 大于或等于 10,则使用中位数,否则使用平均值。

请记住样本不会总是 3,所以我不能使用列 (2:4)。但他们的名字将永远是 sample_X.

谁能帮帮我?

这有效,使用 grep 获取列数

for(i in 1:nrow(DF)){

   cols <- grep("sample", names(DF))
   if(DF[i,]$Frequence > 10){
     DF$median_mean[i] <- mean(as.integer(DF[i,cols]))
   }else{
     DF$median_mean[i] <- median(as.integer(DF[i,cols]))
  } 
}
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")

DF[DF$Frequence>10,]$median_mean<-apply(DF[DF$Frequence>10,grep("sample_",names(DF))],1,median)
DF[DF$Frequence<10,]$median_mean<-rowMeans(DF[DF$Frequence<10,grep("sample_",names(DF))])

遍历行,根据Frequence:

列得到匹配函数(match.fun)
# sample_ column index
ix <- grepl("sample_", colnames(DF), fixed = TRUE)

DF$median_mean <- apply(DF, 1, function(i){
  myFun <- match.fun(ifelse(i[6] >= 10, "median", "mean"))
  myFun(as.numeric(i[ix]))
})
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")

DF$median_mean = ifelse(DF$Frequence>=10, apply(DF[grep("sample_", names(DF))], 1L, median), apply(DF[grep("sample_", names(DF))], 1L, mean))

说明

我们将 medianmean 应用于相关列,使用:

  • apply(DF[grep("sample_", names(DF))], 1L, median)

  • apply(DF[grep("sample_", names(DF))], 1L, mean)

但我们 return 仅使用三元运算符的向量化形式 ifelse.

得到我们想要的值

该代码也适用于任意数量的名为 sample_X 的列,因为我们概括了列的选择,只需使用 grep("sample_", names(DF)).

搜索它们的名称