如何通过r中的同一组对多个变量的值求和
how to sum the value for multiple variables by the same group in r
我需要对同一组的大约 40 个变量的值求和。
这是一个示例数据集。所以我想按地区和部门对score1-score5的值求和。
region <- rep(c("south", "east", "west", "north"),times=10)
department <- rep(c("A", "B","C","D","E"),times=8)
score1 <- rnorm(n = 40, mean = 0, sd = 1)
score2 <-rnorm(n = 40, mean = 3, sd = 1.5)
score3 <-rnorm(n = 40, mean = 2, sd = 1)
score4 <-rnorm(n = 40, mean = 1, sd = 1.5)
score5 <-rnorm(n = 40, mean = 5, sd = 1.5)
df <- data.frame(region, department, score1, score2, score3, score4, score5)
这是导致我想要的结果的代码,但是有没有更简单的方法来做到这一点:
df %>% group_by(region, department) %>%
summarise(score1=sum(score1),
score2=sum(score2),
score3=sum(score3),
score4=sum(score4),
score5=sum(score5))
我尝试使用循环,但这没有用:
vlist<-c("score1", "score2", "score3", "score4", "score5")
for (var in vlist) {
df<-df %>% group_by(region, department) %>%
summarise(var=sum(.[[var]]))
}
有没有其他方法或者我的循环有什么问题?
谢谢!
使用 across
- 循环 across
starts_with
'score' 的列并得到 sum
library(dplyr)
out1 <- df %>%
group_by(region, department) %>%
summarise(across(starts_with('score'), sum), .groups = 'drop')
在 for
循环中,问题是 df
在每次迭代中得到更新 (df <-..
) 并且 summarise
returns 仅分组依据中提供的列和汇总输出。因此,在第一次迭代之后,'df' 根本不会有 'score' 列。如果我们想使用 for
循环,在 list
中获取输出,然后在 reduce
中使用 join
library(purrr)
out_list <- vector('list', length(vlist))
names(out_list) <- vlist
for (var in vlist) {
out_list[[var]] <- df %>%
group_by(region, department) %>%
summarise(!!var := sum(cur_data()[[var]]), .groups = 'drop')
}
out2 <- reduce(out_list, full_join, by = c('region', 'department'))
-检查输出
> identical(out1, out2)
[1] TRUE
我需要对同一组的大约 40 个变量的值求和。
这是一个示例数据集。所以我想按地区和部门对score1-score5的值求和。
region <- rep(c("south", "east", "west", "north"),times=10)
department <- rep(c("A", "B","C","D","E"),times=8)
score1 <- rnorm(n = 40, mean = 0, sd = 1)
score2 <-rnorm(n = 40, mean = 3, sd = 1.5)
score3 <-rnorm(n = 40, mean = 2, sd = 1)
score4 <-rnorm(n = 40, mean = 1, sd = 1.5)
score5 <-rnorm(n = 40, mean = 5, sd = 1.5)
df <- data.frame(region, department, score1, score2, score3, score4, score5)
这是导致我想要的结果的代码,但是有没有更简单的方法来做到这一点:
df %>% group_by(region, department) %>%
summarise(score1=sum(score1),
score2=sum(score2),
score3=sum(score3),
score4=sum(score4),
score5=sum(score5))
我尝试使用循环,但这没有用:
vlist<-c("score1", "score2", "score3", "score4", "score5")
for (var in vlist) {
df<-df %>% group_by(region, department) %>%
summarise(var=sum(.[[var]]))
}
有没有其他方法或者我的循环有什么问题? 谢谢!
使用 across
- 循环 across
starts_with
'score' 的列并得到 sum
library(dplyr)
out1 <- df %>%
group_by(region, department) %>%
summarise(across(starts_with('score'), sum), .groups = 'drop')
在 for
循环中,问题是 df
在每次迭代中得到更新 (df <-..
) 并且 summarise
returns 仅分组依据中提供的列和汇总输出。因此,在第一次迭代之后,'df' 根本不会有 'score' 列。如果我们想使用 for
循环,在 list
中获取输出,然后在 reduce
中使用 join
library(purrr)
out_list <- vector('list', length(vlist))
names(out_list) <- vlist
for (var in vlist) {
out_list[[var]] <- df %>%
group_by(region, department) %>%
summarise(!!var := sum(cur_data()[[var]]), .groups = 'drop')
}
out2 <- reduce(out_list, full_join, by = c('region', 'department'))
-检查输出
> identical(out1, out2)
[1] TRUE