计算没有循环函数的生态数据的标准化效果大小
calculating standardized effect size for ecological data without loops functions
我愿意计算 2 个土壤水位(最佳、降低)和两个多样性水平(高与低)之间土壤氮的标准化效应大小,每个水平重复 5 次。我分别使用公式 ES = (soil_N optimal - Soil_N reduced) / (soil_N optimal + soil_N reduced)对于我可以用来绘制图表的每个重复和多样性级别(高和低)。是否可以在不使用循环的情况下进行此计算?我期待 dplyr/tidyverse 解决这个问题。非常期待您简单的回答。
df<- data.frame(Soilwater = c("optimal", "optimal", "optimal", "optimal", "optimal",
"reduced", "reduced", "reduced", "reduced", "reduced",
"optimal", "optimal", "optimal", "optimal", "optimal",
"reduced", "reduced", "reduced", "reduced", "reduced"),
Diversity = c("High","High","High","High","High","High","High","High","High","High",
"Low", "Low", "Low","Low","Low","Low","Low","Low","Low","Low"),
Soil_N = c(50,45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, 78, 77, 89, 89, 87, 88, 89),
Replicate = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5))
例如,对于高多样性,效应大小可以计算为 (50-79)/(50+79) = -0.2248,(45-78)/(45-78) = -0.2682, (49-79)/(49+79) = -0.2343,(48-78)/(48+78) = -0.2381,(49-77)/(49+77) = -0.2222。这 5 次重复的平均值为 -0.2375。对于低多样性水平也应该这样做。我使用了以下代码,但我只获得了 1 次重复的值,而不是每次处理的所有 5 次重复的值。任何帮助将不胜感激!
df%>%
rowwise()%>%
dplyr::mutate(Replicate = row_number())%>%
dplyr::group_by(Soilwater, Diversity, Replicate)%>%
dplyr::summarise(Soil_N = mean(Soil_N))%>%
tidyr::spread(key = (Soilwater), value = Soil_N)%>%
dplyr::mutate(ES = (Optimal-Reduced)/(Optimal+Reduced))
如果我理解你的问题,我认为这只是重新排序你的函数的情况,有几点说明:
- 您的手动计算似乎是从 high-diversity-optimal 中减去 low-diversity-optimal(50-79;因此按
Soilwater
分组)但您的问题似乎想按 [=13= 分组]?已进行其中的第二个,但可以轻松更改
- 我用
pivot_wider
而不是 spread
- 做同样的事情,但我发现它更容易理解!
- 您的
rowwise() %>% mutate(Replicate = row_number())
部分只是为所有行分配 Replicate
值 1,因此将它们全部分组为一个观察。别以为那是你的目的所以就去掉了。
编辑以处理缺失的变量
考虑 NA 的一个选项是简单地使用 mean(ES, na.rm = TRUE)
计算平均值 - 有效地从计算中删除缺失的行:
一)
library(dplyr)
library(tidyr)
df <- tibble(
Soilwater = rep(rep(c("optimal", "reduced"), each = 5), times = 2),
Diversity = rep(c("high", "low"), each = 10),
Soil_N = c(50, 45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, NA_integer_, 77, 89, 89, 87, 88, 89),
Replicate = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
)
df %>%
pivot_wider(
id_cols = c(Diversity, Replicate),
names_from = Soilwater,
values_from = Soil_N
) %>%
mutate(ES = (optimal - reduced) / (optimal + reduced)) %>%
group_by(Diversity) %>%
summarise(mean_ES = mean(ES, na.rm = TRUE))
#> # A tibble: 2 x 2
#> Diversity mean_ES
#> <chr> <dbl>
#> 1 high -0.175
#> 2 low -0.0615
另一种选择是为 optimal
和 reduced
.[=24 的每个测量值从合并的 Soilwater
/Diversity
组的平均值中估算缺失值=]
b)
df %>%
pivot_wider(
id_cols = c(Diversity, Replicate),
names_from = Soilwater,
values_from = Soil_N
) %>%
group_by(Diversity) %>%
mutate(across(c(optimal, reduced), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
ES = (optimal - reduced) / (optimal + reduced)
) %>%
summarise(mean_ES = mean(ES, na.rm = TRUE))
#> # A tibble: 2 x 2
#> Diversity mean_ES
#> <chr> <dbl>
#> 1 high -0.175
#> 2 low -0.0609
由 reprex package (v2.0.1)
于 2022-03-17 创建
我愿意计算 2 个土壤水位(最佳、降低)和两个多样性水平(高与低)之间土壤氮的标准化效应大小,每个水平重复 5 次。我分别使用公式 ES = (soil_N optimal - Soil_N reduced) / (soil_N optimal + soil_N reduced)对于我可以用来绘制图表的每个重复和多样性级别(高和低)。是否可以在不使用循环的情况下进行此计算?我期待 dplyr/tidyverse 解决这个问题。非常期待您简单的回答。
df<- data.frame(Soilwater = c("optimal", "optimal", "optimal", "optimal", "optimal",
"reduced", "reduced", "reduced", "reduced", "reduced",
"optimal", "optimal", "optimal", "optimal", "optimal",
"reduced", "reduced", "reduced", "reduced", "reduced"),
Diversity = c("High","High","High","High","High","High","High","High","High","High",
"Low", "Low", "Low","Low","Low","Low","Low","Low","Low","Low"),
Soil_N = c(50,45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, 78, 77, 89, 89, 87, 88, 89),
Replicate = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5))
例如,对于高多样性,效应大小可以计算为 (50-79)/(50+79) = -0.2248,(45-78)/(45-78) = -0.2682, (49-79)/(49+79) = -0.2343,(48-78)/(48+78) = -0.2381,(49-77)/(49+77) = -0.2222。这 5 次重复的平均值为 -0.2375。对于低多样性水平也应该这样做。我使用了以下代码,但我只获得了 1 次重复的值,而不是每次处理的所有 5 次重复的值。任何帮助将不胜感激!
df%>%
rowwise()%>%
dplyr::mutate(Replicate = row_number())%>%
dplyr::group_by(Soilwater, Diversity, Replicate)%>%
dplyr::summarise(Soil_N = mean(Soil_N))%>%
tidyr::spread(key = (Soilwater), value = Soil_N)%>%
dplyr::mutate(ES = (Optimal-Reduced)/(Optimal+Reduced))
如果我理解你的问题,我认为这只是重新排序你的函数的情况,有几点说明:
- 您的手动计算似乎是从 high-diversity-optimal 中减去 low-diversity-optimal(50-79;因此按
Soilwater
分组)但您的问题似乎想按 [=13= 分组]?已进行其中的第二个,但可以轻松更改 - 我用
pivot_wider
而不是spread
- 做同样的事情,但我发现它更容易理解! - 您的
rowwise() %>% mutate(Replicate = row_number())
部分只是为所有行分配Replicate
值 1,因此将它们全部分组为一个观察。别以为那是你的目的所以就去掉了。
编辑以处理缺失的变量
考虑 NA 的一个选项是简单地使用 mean(ES, na.rm = TRUE)
计算平均值 - 有效地从计算中删除缺失的行:
一)
library(dplyr)
library(tidyr)
df <- tibble(
Soilwater = rep(rep(c("optimal", "reduced"), each = 5), times = 2),
Diversity = rep(c("high", "low"), each = 10),
Soil_N = c(50, 45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, NA_integer_, 77, 89, 89, 87, 88, 89),
Replicate = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
)
df %>%
pivot_wider(
id_cols = c(Diversity, Replicate),
names_from = Soilwater,
values_from = Soil_N
) %>%
mutate(ES = (optimal - reduced) / (optimal + reduced)) %>%
group_by(Diversity) %>%
summarise(mean_ES = mean(ES, na.rm = TRUE))
#> # A tibble: 2 x 2
#> Diversity mean_ES
#> <chr> <dbl>
#> 1 high -0.175
#> 2 low -0.0615
另一种选择是为 optimal
和 reduced
.[=24 的每个测量值从合并的 Soilwater
/Diversity
组的平均值中估算缺失值=]
b)
df %>%
pivot_wider(
id_cols = c(Diversity, Replicate),
names_from = Soilwater,
values_from = Soil_N
) %>%
group_by(Diversity) %>%
mutate(across(c(optimal, reduced), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
ES = (optimal - reduced) / (optimal + reduced)
) %>%
summarise(mean_ES = mean(ES, na.rm = TRUE))
#> # A tibble: 2 x 2
#> Diversity mean_ES
#> <chr> <dbl>
#> 1 high -0.175
#> 2 low -0.0609
由 reprex package (v2.0.1)
于 2022-03-17 创建