按R中的条件总结多行

sum up multiple rows by condition in R

我知道有很多关于如何在 R 中的条件下对列求和的类似问题。但我无法在我的数据中实现函数 aggregatedplyr::group_by(df) %>% summarise(variable = sum(variable))。另外 Combine rows and sum their values 对我没有帮助。但也许你可以? 我想在 R.

中合并和总结 data.frame 的行
df <- data.frame(file=c('sample1','sample1','sample2','sample3','sample2'),gene1=c(34,365,76,0,4),gene2=c(34,0,0,456,0))
> df
     file gene1 gene2
1 sample1    34    34
2 sample1   365     0
3 sample2    76     0
4 sample3     0   456
5 sample2    4      0

输出应如下所示

 file gene1 gene2
1 sample1    399    34
2 sample2    80     0
3 sample3     0   456

base 中,您可以使用 rowsum 按组 汇总行。

rowsum(df[-1], df[,1])
#        gene1 gene2
#sample1   399    34
#sample2    80     0
#sample3     0   456

或使用aggregate:

aggregate(.~file, df, sum)
#     file gene1 gene2
#1 sample1   399    34
#2 sample2    80     0
#3 sample3     0   456

或使用by:

do.call(rbind, by(df[-1], df[,1], colSums))
#        gene1 gene2
#sample1   399    34
#sample2    80     0
#sample3     0   456

dplyr 方法是:

library(dplyr)

df %>% group_by(file) %>% summarise_all(.funs = sum,na.rm=T)

输出:

# A tibble: 3 x 3
  file    gene1 gene2
  <fct>   <dbl> <dbl>
1 sample1   399    34
2 sample2    80     0
3 sample3     0   456

您可以尝试使用 dplyr

df %>% 
  group_by(file) %>% 
  summarise(gene1 = sum(gene1), gene2 = sum(gene2))

data.table

setDT(df)[,.(gene1 = sum(gene1), gene2 = sum(gene2)), by= .(file)]
      file gene1 gene2
1: sample1   399    34
2: sample2    80     0
3: sample3     0   456