按R中的条件总结多行
sum up multiple rows by condition in R
我知道有很多关于如何在 R 中的条件下对列求和的类似问题。但我无法在我的数据中实现函数 aggregate
或 dplyr::group_by(df) %>% summarise(variable = sum(variable))
。另外 Combine rows and sum their values 对我没有帮助。但也许你可以?
我想在 R.
中合并和总结 data.frame 的行
df <- data.frame(file=c('sample1','sample1','sample2','sample3','sample2'),gene1=c(34,365,76,0,4),gene2=c(34,0,0,456,0))
> df
file gene1 gene2
1 sample1 34 34
2 sample1 365 0
3 sample2 76 0
4 sample3 0 456
5 sample2 4 0
输出应如下所示
file gene1 gene2
1 sample1 399 34
2 sample2 80 0
3 sample3 0 456
在 base 中,您可以使用 rowsum
来 按组 汇总行。
rowsum(df[-1], df[,1])
# gene1 gene2
#sample1 399 34
#sample2 80 0
#sample3 0 456
或使用aggregate
:
aggregate(.~file, df, sum)
# file gene1 gene2
#1 sample1 399 34
#2 sample2 80 0
#3 sample3 0 456
或使用by
:
do.call(rbind, by(df[-1], df[,1], colSums))
# gene1 gene2
#sample1 399 34
#sample2 80 0
#sample3 0 456
dplyr
方法是:
library(dplyr)
df %>% group_by(file) %>% summarise_all(.funs = sum,na.rm=T)
输出:
# A tibble: 3 x 3
file gene1 gene2
<fct> <dbl> <dbl>
1 sample1 399 34
2 sample2 80 0
3 sample3 0 456
您可以尝试使用 dplyr
df %>%
group_by(file) %>%
summarise(gene1 = sum(gene1), gene2 = sum(gene2))
或data.table
setDT(df)[,.(gene1 = sum(gene1), gene2 = sum(gene2)), by= .(file)]
file gene1 gene2
1: sample1 399 34
2: sample2 80 0
3: sample3 0 456
我知道有很多关于如何在 R 中的条件下对列求和的类似问题。但我无法在我的数据中实现函数 aggregate
或 dplyr::group_by(df) %>% summarise(variable = sum(variable))
。另外 Combine rows and sum their values 对我没有帮助。但也许你可以?
我想在 R.
df <- data.frame(file=c('sample1','sample1','sample2','sample3','sample2'),gene1=c(34,365,76,0,4),gene2=c(34,0,0,456,0))
> df
file gene1 gene2
1 sample1 34 34
2 sample1 365 0
3 sample2 76 0
4 sample3 0 456
5 sample2 4 0
输出应如下所示
file gene1 gene2
1 sample1 399 34
2 sample2 80 0
3 sample3 0 456
在 base 中,您可以使用 rowsum
来 按组 汇总行。
rowsum(df[-1], df[,1])
# gene1 gene2
#sample1 399 34
#sample2 80 0
#sample3 0 456
或使用aggregate
:
aggregate(.~file, df, sum)
# file gene1 gene2
#1 sample1 399 34
#2 sample2 80 0
#3 sample3 0 456
或使用by
:
do.call(rbind, by(df[-1], df[,1], colSums))
# gene1 gene2
#sample1 399 34
#sample2 80 0
#sample3 0 456
dplyr
方法是:
library(dplyr)
df %>% group_by(file) %>% summarise_all(.funs = sum,na.rm=T)
输出:
# A tibble: 3 x 3
file gene1 gene2
<fct> <dbl> <dbl>
1 sample1 399 34
2 sample2 80 0
3 sample3 0 456
您可以尝试使用 dplyr
df %>%
group_by(file) %>%
summarise(gene1 = sum(gene1), gene2 = sum(gene2))
或data.table
setDT(df)[,.(gene1 = sum(gene1), gene2 = sum(gene2)), by= .(file)]
file gene1 gene2
1: sample1 399 34
2: sample2 80 0
3: sample3 0 456