如何根据 R 中特定列给出的子组创建新的数据框提取方法

How to create a new dataframe extracting means based on subgroups given by a specific column in R

好久没找到解决这个问题的好方法了。我想要做的是根据 ID 对数据框中的某些行进行平均,然后创建一个不同的数据框。假设我有一个如下所示的数据框:

数据

structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli", 
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1", 
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L, 
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L, 
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))
ID   Name     Location x1 x2 x3 x4 x5
A1   S.coli   Indv1     1  2  3  4   5
A1   S.coli   Indv1     1  2  3  4   5
A1   S.coli   Indv1     1  2  3  4   5
A1   S.coli   Indv1     1  2  3  4   5
A2   S.coli   Indv2     1  2  3  4   5
A2   S.coli   Indv2     1  2  3  4   5
A2   S.coli   Indv2     1  2  3  4   5

现在我想要第二个数据框,每个变量 x 每个 ID 代码的平均值也保留名称和位置。 平均值数据框:

ID   Name     Location x1 x2 x3 x4 x5
A1   S.coli   Indv1    1  2  3  4   5
A2   S.coli   Indv2    1  2  3  4   5

我有很多 ID 代码,所以子集化然后加入表格几乎就像手动操作一样。我想知道是否有更有效的方法来做到这一点。 提前致谢!!

我们可以使用

library(dplyr)
df %>%
   group_by(across(ID:Location)) %>%
   summarise(across(everything(), mean, na.rm = TRUE))
# A tibble: 2 x 8
# Groups:   ID, Name [2]
#  ID    Name   Location    x1    x2    x3    x4    x5
#  <chr> <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A1    S.coli Indv1        1     2     3     4     5
#2 A2    S.coli Indv2        1     2     3     4     5

数据

df <- structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli", 
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1", 
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L, 
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L, 
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))

与旧 dplyr 版本的@Akrun 相同的逻辑

library(dplyr)
df %>% 
  group_by(ID, Name, Location) %>% 
  summarise_at(vars(x1:x5), mean, na.rm = TRUE)
# Groups:   ID, Name [2]
#   ID    Name   Location    x1    x2    x3    x4    x5
# <chr> <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A1    S.coli Indv1        1     2     3     4     5
# 2 A2    S.coli Indv2        1     2     3     4     5