使用 dplyr 从具有多个物种、处理和变量的数据框中计算百分比
Calculate percent from data frame with several species, treatments and variables using dplyr
问题
创建一个包含百分比的新行
数据
df<- data.frame(
species = c ("A","A","A","A","B","B","B","B","A","A","A","A","B","B","B","B"),
number = c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2),
treatment = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1),
variable = c ("x","y","x","y","x","y","x","y","x","y","x","y","x","y","x","y"),
value = sample(1:16)
)
问题
我想计算给定数量和处理的物种的百分比。即变量 x 和 y(前两行)的总和应为 100%。
我试过 dplyr:
result <- df%>%
group_by(variable) %>%
mutate(percent = value*100/sum(value))
test<-subset(result,variable=="x")
sum(test[,6]) # sums to 100%
"test" 是错误的,因为它是两个物种和两种处理的所有 x 的百分比。
期望输出
species number treatment variable value percent
A 1 0 x 40 40
A 1 0 y 60 60
A 2 0 x 1 10
A 2 0 y 9 90
这是您要找的吗?我正在使用 data.table
包:
library(data.table)
DT <- as.data.table(df)
DT_output <- DT[,list(value=sum(value)),by=c('species', 'number', 'treatment', 'variable')]
DT_temp <- DT[,list(sum=sum(value)),by=c('species', 'number', 'treatment' )]
T_output <- merge(DT_output, DT_temp, by = c('species', 'number', 'treatment'))
DT_output[, percent := 100 * value / sum]
setorder(DT_output, species,treatment,number,variable)
DT_output
这是一个使用 tidyr
:
的答案
require(tidyr)
require(dplyr)
df %>% spread(variable, value) %>%
mutate(percent.x = x / (x+y),
percent.y = y / (x+y))
这里还有一个仅 dplyr
的解决方案:
df %>% group_by(number, treatment, species) %>%
mutate(percent = 100 * value / sum(value))
您的问题是您对 group_by()
的变量完全错误。由于您希望在特定 (number, treatment, solution)
组合中定义百分比,但要在您的 variable
中有所不同,您应该 group_by()
前者,而不是后者。
问题
创建一个包含百分比的新行
数据
df<- data.frame(
species = c ("A","A","A","A","B","B","B","B","A","A","A","A","B","B","B","B"),
number = c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2),
treatment = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1),
variable = c ("x","y","x","y","x","y","x","y","x","y","x","y","x","y","x","y"),
value = sample(1:16)
)
问题
我想计算给定数量和处理的物种的百分比。即变量 x 和 y(前两行)的总和应为 100%。
我试过 dplyr:
result <- df%>%
group_by(variable) %>%
mutate(percent = value*100/sum(value))
test<-subset(result,variable=="x")
sum(test[,6]) # sums to 100%
"test" 是错误的,因为它是两个物种和两种处理的所有 x 的百分比。
期望输出
species number treatment variable value percent
A 1 0 x 40 40
A 1 0 y 60 60
A 2 0 x 1 10
A 2 0 y 9 90
这是您要找的吗?我正在使用 data.table
包:
library(data.table)
DT <- as.data.table(df)
DT_output <- DT[,list(value=sum(value)),by=c('species', 'number', 'treatment', 'variable')]
DT_temp <- DT[,list(sum=sum(value)),by=c('species', 'number', 'treatment' )]
T_output <- merge(DT_output, DT_temp, by = c('species', 'number', 'treatment'))
DT_output[, percent := 100 * value / sum]
setorder(DT_output, species,treatment,number,variable)
DT_output
这是一个使用 tidyr
:
require(tidyr)
require(dplyr)
df %>% spread(variable, value) %>%
mutate(percent.x = x / (x+y),
percent.y = y / (x+y))
这里还有一个仅 dplyr
的解决方案:
df %>% group_by(number, treatment, species) %>%
mutate(percent = 100 * value / sum(value))
您的问题是您对 group_by()
的变量完全错误。由于您希望在特定 (number, treatment, solution)
组合中定义百分比,但要在您的 variable
中有所不同,您应该 group_by()
前者,而不是后者。