计算没有循环函数的生态数据的标准化效果大小

calculating standardized effect size for ecological data without loops functions

我愿意计算 2 个土壤水位(最佳、降低)和两个多样性水平(高与低)之间土壤氮的标准化效应大小,每个水平重复 5 次。我分别使用公式 ES = (soil_N optimal - Soil_N reduced) / (soil_N optimal + soil_N reduced)对于我可以用来绘制图表的每个重复和多样性级别(高和低)。是否可以在不使用循环的情况下进行此计算?我期待 dplyr/tidyverse 解决这个问题。非常期待您简单的回答。

df<- data.frame(Soilwater = c("optimal", "optimal", "optimal", "optimal", "optimal", 
   "reduced", "reduced", "reduced", "reduced", "reduced", 
   "optimal", "optimal", "optimal", "optimal", "optimal", 
   "reduced", "reduced", "reduced", "reduced", "reduced"), 
  Diversity = c("High","High","High","High","High","High","High","High","High","High",   
   "Low", "Low", "Low","Low","Low","Low","Low","Low","Low","Low"),
 Soil_N = c(50,45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, 78, 77, 89, 89, 87, 88, 89), 
 Replicate = c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5))

例如,对于高多样性,效应大小可以计算为 (50-79)/(50+79) = -0.2248,(45-78)/(45-78) = -0.2682, (49-79)/(49+79) = -0.2343,(48-78)/(48+78) = -0.2381,(49-77)/(49+77) = -0.2222。这 5 次重复的平均值为 -0.2375。对于低多样性水平也应该这样做。我使用了以下代码,但我只获得了 1 次重复的值,而不是每次处理的所有 5 次重复的值。任何帮助将不胜感激!

df%>%  
rowwise()%>%    
dplyr::mutate(Replicate = row_number())%>%    
dplyr::group_by(Soilwater, Diversity, Replicate)%>%     
dplyr::summarise(Soil_N = mean(Soil_N))%>% 
tidyr::spread(key = (Soilwater), value = Soil_N)%>%    
dplyr::mutate(ES = (Optimal-Reduced)/(Optimal+Reduced))

如果我理解你的问题,我认为这只是重新排序你的函数的情况,有几点说明:

  • 您的手动计算似乎是从 high-diversity-optimal 中减去 low-diversity-optimal(50-79;因此按 Soilwater 分组)但您的问题似乎想按 [=13= 分组]?已进行其中的第二个,但可以轻松更改
  • 我用 pivot_wider 而不是 spread - 做同样的事情,但我发现它更容易理解!
  • 您的 rowwise() %>% mutate(Replicate = row_number()) 部分只是为所有行分配 Replicate 值 1,因此将它们全部分组为一个观察。别以为那是你的目的所以就去掉了。

编辑以处理缺失的变量

考虑 NA 的一个选项是简单地使用 mean(ES, na.rm = TRUE) 计算平均值 - 有效地从计算中删除缺失的行:

一)

library(dplyr)
library(tidyr)


df <- tibble(
  Soilwater = rep(rep(c("optimal", "reduced"), each = 5), times = 2),
  Diversity = rep(c("high", "low"), each = 10),
  Soil_N = c(50, 45, 49, 48, 49, 69, 68, 69, 70, 67, 79, 78, 79, NA_integer_, 77, 89, 89, 87, 88, 89),
  Replicate = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
)

df %>%
  pivot_wider(
    id_cols = c(Diversity, Replicate),
    names_from = Soilwater,
    values_from = Soil_N
  ) %>%
  mutate(ES = (optimal - reduced) / (optimal + reduced)) %>%
  group_by(Diversity) %>%
  summarise(mean_ES = mean(ES, na.rm = TRUE))

#> # A tibble: 2 x 2
#>   Diversity mean_ES
#>   <chr>       <dbl>
#> 1 high      -0.175 
#> 2 low       -0.0615

另一种选择是为 optimalreduced.[=24 的每个测量值从合并的 Soilwater/Diversity 组的平均值中估算缺失值=]

b)

df %>%
  pivot_wider(
    id_cols = c(Diversity, Replicate),
    names_from = Soilwater,
    values_from = Soil_N
  ) %>%
  group_by(Diversity) %>%
  mutate(across(c(optimal, reduced), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
    ES = (optimal - reduced) / (optimal + reduced)
  ) %>%
  summarise(mean_ES = mean(ES, na.rm = TRUE))

#> # A tibble: 2 x 2
#>   Diversity mean_ES
#>   <chr>       <dbl>
#> 1 high      -0.175 
#> 2 low       -0.0609

reprex package (v2.0.1)

于 2022-03-17 创建