使用 dplyr 从具有多个物种、处理和变量的数据框中计算百分比

Calculate percent from data frame with several species, treatments and variables using dplyr

问题

创建一个包含百分比的新行

数据

 df<- data.frame(
     species   = c ("A","A","A","A","B","B","B","B","A","A","A","A","B","B","B","B"),
     number    = c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2),
     treatment = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1),
     variable  = c ("x","y","x","y","x","y","x","y","x","y","x","y","x","y","x","y"),
     value = sample(1:16)
    )

问题

我想计算给定数量和处理的物种的百分比。即变量 x 和 y(前两行)的总和应为 100%。

我试过 dplyr:

result <- df%>%
    group_by(variable) %>%
    mutate(percent = value*100/sum(value))

test<-subset(result,variable=="x")
sum(test[,6]) # sums to 100%

"test" 是错误的,因为它是两个物种和两种处理的所有 x 的百分比。

期望输出

 species number treatment variable value    percent
    A      1         0        x     40         40
    A      1         0        y     60         60
    A      2         0        x      1         10
    A      2         0        y      9         90

这是您要找的吗?我正在使用 data.table 包:

library(data.table)
DT <- as.data.table(df)

DT_output <- DT[,list(value=sum(value)),by=c('species', 'number', 'treatment', 'variable')]
DT_temp <- DT[,list(sum=sum(value)),by=c('species', 'number', 'treatment' )]

T_output <- merge(DT_output, DT_temp, by = c('species', 'number', 'treatment'))

DT_output[, percent := 100 * value / sum]

setorder(DT_output, species,treatment,number,variable)
DT_output

这是一个使用 tidyr:

的答案
require(tidyr)
require(dplyr) 

df %>% spread(variable, value) %>% 
        mutate(percent.x = x / (x+y), 
               percent.y = y / (x+y)) 

这里还有一个仅 dplyr 的解决方案:

df %>% group_by(number, treatment, species) %>% 
        mutate(percent = 100 * value / sum(value)) 

您的问题是您对 group_by() 的变量完全错误。由于您希望在特定 (number, treatment, solution) 组合中定义百分比,但要在您的 variable 中有所不同,您应该 group_by() 前者,而不是后者。