如何计算嵌套组的均值并计算 R 中的观察次数

Question

所有（R 用户），非常感谢您。我有一个数据集，其中包含来自多个州的学生分数。每个州有不同的学校（本例中有 10 所学校），每所学校必须是 'public' 或 'private'；和三个项目的测试分数。我需要计算每个项目的每个学校的平均值，并显示学校的类型，然后将结果保存到 excel 文件中以导出它们。

excel 文件的预期结果将包括：

州名栏，
学校名称栏（每个州栏 10 所学校），
学校类型栏（表示'public'或'private'），
每所学校的学生人数，
item1 的平均值，
item2 的平均值，并且
item3 的平均值。

library(randomNames)

# example to demonstrate the general concept): 
ID = 1:50
states = rep(c("TS", "NE", "AR", "MO", "WA"),times = c(10, 10, 10, 10, 10))
schools = randomNames::randomNames(50) ## 5 first last names separated by a space
type = rep(c("private", "public"),times = c(20,30))
item1 = rnorm(50, mean=25, sd=5)
item2 = rnorm(50, mean=30, sd=5)
item3 = rnorm(50, mean=15, sd=5)
df = data.frame(ID, states, schools, type, item1, item2, item3)

然后我需要将其保存到 excel 文件中，以便使用以下代码分别导出每个状态：

# this below code works fine, I'm just adding it to explain the full concept. 

list_data <- split(df, df$states)
Map(openxlsx::write.xlsx, list_data, paste0(names(list_data), '.xlsx'))

非常感谢。

Answer 1

您可以使用 dplyr 和 tidyr 软件包执行此操作：

library(dplyr)
library(tidyr)

df %>% 
  dplyr::group_by(states, schools, type) %>% 
  dplyr::summarize(across(tidyr::starts_with("item"), ~ mean(.)),
                   students = n()) %>%
  dplyr::ungroup()

   states schools             type   item1 item2 item3 students
   <chr>  <chr>               <chr>  <dbl> <dbl> <dbl>    <int>
 1 AR     al-Hosein, Zubaida  public  23.4  35.1 15.4         1
 2 AR     al-Mohamed, Raadiya public  24.5  30.8 13.5         1
 3 AR     Bluford, Sage       public  29.9  32.4  9.49        1
 4 AR     Covarrubias, Julio  public  19.8  27.8 15.2         1
 5 AR     el-Gad, Naaila      public  27.0  33.5 19.5         1
 6 AR     el-Mansour, Fawzia  public  34.4  25.4 17.9         1
 7 AR     el-Sadri,  Sakeena  public  24.7  30.5 13.9         1
 8 AR     Ewers, Benjamin     public  18.3  33.6 13.5         1
 9 AR     Rivas, Joel         public  16.8  25.1 20.5         1
10 AR     Wilson, Reneisha    public  28.9  28.5 18.5         1
# ... with 40 more rows

如果您有其他以 item 开头的列名称，那么您可以将行 across(tidyr::starts_with(.... 更改为 item1 = mean(item1) 等等。

学生的计数假定学校和州内的每一行都是一名学生，并且类型不会因给定学校而改变。

如何计算嵌套组的均值并计算 R 中的观察次数

How to calculate means for nested groups and count number of observations in R

r

data-analysis