如何根据 R 中特定列给出的子组创建新的数据框提取方法
How to create a new dataframe extracting means based on subgroups given by a specific column in R
好久没找到解决这个问题的好方法了。我想要做的是根据 ID 对数据框中的某些行进行平均,然后创建一个不同的数据框。假设我有一个如下所示的数据框:
数据
structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli",
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1",
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L,
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L,
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L,
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-7L))
ID Name Location x1 x2 x3 x4 x5
A1 S.coli Indv1 1 2 3 4 5
A1 S.coli Indv1 1 2 3 4 5
A1 S.coli Indv1 1 2 3 4 5
A1 S.coli Indv1 1 2 3 4 5
A2 S.coli Indv2 1 2 3 4 5
A2 S.coli Indv2 1 2 3 4 5
A2 S.coli Indv2 1 2 3 4 5
现在我想要第二个数据框,每个变量 x 每个 ID 代码的平均值也保留名称和位置。
平均值数据框:
ID Name Location x1 x2 x3 x4 x5
A1 S.coli Indv1 1 2 3 4 5
A2 S.coli Indv2 1 2 3 4 5
我有很多 ID 代码,所以子集化然后加入表格几乎就像手动操作一样。我想知道是否有更有效的方法来做到这一点。
提前致谢!!
我们可以使用
library(dplyr)
df %>%
group_by(across(ID:Location)) %>%
summarise(across(everything(), mean, na.rm = TRUE))
# A tibble: 2 x 8
# Groups: ID, Name [2]
# ID Name Location x1 x2 x3 x4 x5
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A1 S.coli Indv1 1 2 3 4 5
#2 A2 S.coli Indv2 1 2 3 4 5
数据
df <- structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli",
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1",
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L,
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L,
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L,
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-7L))
与旧 dplyr
版本的@Akrun 相同的逻辑
library(dplyr)
df %>%
group_by(ID, Name, Location) %>%
summarise_at(vars(x1:x5), mean, na.rm = TRUE)
# Groups: ID, Name [2]
# ID Name Location x1 x2 x3 x4 x5
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A1 S.coli Indv1 1 2 3 4 5
# 2 A2 S.coli Indv2 1 2 3 4 5
好久没找到解决这个问题的好方法了。我想要做的是根据 ID 对数据框中的某些行进行平均,然后创建一个不同的数据框。假设我有一个如下所示的数据框:
数据
structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli",
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1",
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L,
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L,
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L,
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-7L))
ID Name Location x1 x2 x3 x4 x5
A1 S.coli Indv1 1 2 3 4 5
A1 S.coli Indv1 1 2 3 4 5
A1 S.coli Indv1 1 2 3 4 5
A1 S.coli Indv1 1 2 3 4 5
A2 S.coli Indv2 1 2 3 4 5
A2 S.coli Indv2 1 2 3 4 5
A2 S.coli Indv2 1 2 3 4 5
现在我想要第二个数据框,每个变量 x 每个 ID 代码的平均值也保留名称和位置。 平均值数据框:
ID Name Location x1 x2 x3 x4 x5
A1 S.coli Indv1 1 2 3 4 5
A2 S.coli Indv2 1 2 3 4 5
我有很多 ID 代码,所以子集化然后加入表格几乎就像手动操作一样。我想知道是否有更有效的方法来做到这一点。 提前致谢!!
我们可以使用
library(dplyr)
df %>%
group_by(across(ID:Location)) %>%
summarise(across(everything(), mean, na.rm = TRUE))
# A tibble: 2 x 8
# Groups: ID, Name [2]
# ID Name Location x1 x2 x3 x4 x5
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A1 S.coli Indv1 1 2 3 4 5
#2 A2 S.coli Indv2 1 2 3 4 5
数据
df <- structure(list(ID = c("A1", "A1", "A1", "A1", "A2", "A2", "A2"
), Name = c("S.coli", "S.coli", "S.coli", "S.coli", "S.coli",
"S.coli", "S.coli"), Location = c("Indv1", "Indv1", "Indv1",
"Indv1", "Indv2", "Indv2", "Indv2"), x1 = c(1L, 1L, 1L, 1L, 1L,
1L, 1L), x2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), x3 = c(3L, 3L, 3L,
3L, 3L, 3L, 3L), x4 = c(4L, 4L, 4L, 4L, 4L, 4L, 4L), x5 = c(5L,
5L, 5L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-7L))
与旧 dplyr
版本的@Akrun 相同的逻辑
library(dplyr)
df %>%
group_by(ID, Name, Location) %>%
summarise_at(vars(x1:x5), mean, na.rm = TRUE)
# Groups: ID, Name [2]
# ID Name Location x1 x2 x3 x4 x5
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A1 S.coli Indv1 1 2 3 4 5
# 2 A2 S.coli Indv2 1 2 3 4 5