如何按组减少此数据框?
How do I reduce this data frame by groups?
我有以下
t <- structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Alice", "Bob",
"Jane Doe", "John Doe"), class = "factor"), school = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Alice School",
"Bob School", "Someother School", "Someschool College"), class = "factor"),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"),
question = structure(c(2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L, 2L,
4L, 6L, 8L, 1L, 3L, 5L, 7L, 2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L,
2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L), .Label = c("q1", "q2", "q3",
"q4", "q5", "q6", "q7", "q8"), class = "factor"), mark = c(0L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L,
1L), subject = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("C", "M"), class = "factor")), .Names = c("name",
"school", "group", "question", "mark", "subject"), row.names = c(7L,
15L, 23L, 31L, 3L, 11L, 19L, 27L, 8L, 16L, 24L, 32L, 4L, 12L,
20L, 28L, 6L, 14L, 22L, 30L, 2L, 10L, 18L, 26L, 5L, 13L, 21L,
29L, 1L, 9L, 17L, 25L), class = "data.frame")
我需要生成一个数据框,其中每个学生的每个科目都有一个组合分数。组合只是每个问题的分数总和。因此,例如,Jane Doe 在主题 C 上有 3 个,在主题 M 上有 2 个。我已经用 Reduce
和其他方法思考了足够长的时间。我可能会以一种非常程序化的方式解决这个问题,但如果我能用单线(或近似)来解决这个问题,我会更开心。我相信它可以完成...
你在问题中说了;你想 group_by
学生和科目并计算总和
library(tidyverse)
asdf %>%
group_by(name, subject) %>%
summarise(score = sum(mark))
这里有一个data.table
解决方案:
library(data.table)
setDT(t)[, sum(mark), by = list(name, subject)]
为了完整起见,基数 R:
aggregate(mark ~ name + subject, data=t, sum)
这表示 "aggregate the response variable mark
by the grouping variables name
and subject
, using sum
as the aggregation function"。
我有以下
t <- structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Alice", "Bob",
"Jane Doe", "John Doe"), class = "factor"), school = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Alice School",
"Bob School", "Someother School", "Someschool College"), class = "factor"),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"),
question = structure(c(2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L, 2L,
4L, 6L, 8L, 1L, 3L, 5L, 7L, 2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L,
2L, 4L, 6L, 8L, 1L, 3L, 5L, 7L), .Label = c("q1", "q2", "q3",
"q4", "q5", "q6", "q7", "q8"), class = "factor"), mark = c(0L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L,
1L), subject = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("C", "M"), class = "factor")), .Names = c("name",
"school", "group", "question", "mark", "subject"), row.names = c(7L,
15L, 23L, 31L, 3L, 11L, 19L, 27L, 8L, 16L, 24L, 32L, 4L, 12L,
20L, 28L, 6L, 14L, 22L, 30L, 2L, 10L, 18L, 26L, 5L, 13L, 21L,
29L, 1L, 9L, 17L, 25L), class = "data.frame")
我需要生成一个数据框,其中每个学生的每个科目都有一个组合分数。组合只是每个问题的分数总和。因此,例如,Jane Doe 在主题 C 上有 3 个,在主题 M 上有 2 个。我已经用 Reduce
和其他方法思考了足够长的时间。我可能会以一种非常程序化的方式解决这个问题,但如果我能用单线(或近似)来解决这个问题,我会更开心。我相信它可以完成...
你在问题中说了;你想 group_by
学生和科目并计算总和
library(tidyverse)
asdf %>%
group_by(name, subject) %>%
summarise(score = sum(mark))
这里有一个data.table
解决方案:
library(data.table)
setDT(t)[, sum(mark), by = list(name, subject)]
为了完整起见,基数 R:
aggregate(mark ~ name + subject, data=t, sum)
这表示 "aggregate the response variable mark
by the grouping variables name
and subject
, using sum
as the aggregation function"。