请参阅组计算中的变量列表

Question

我想计算一组中每个变量（总共 20 个）的最大值，我想知道是否有更简单的方法来执行计算，而不是使用直接列出所有内容的方法 [= dplyr 中的 11=] 和 group_by？示例数据如下：

Name    Year    test1   test2   test3   test4   test5   test6   test7   test8   test9   test10  test11  test12  test13  test14  test15  test16  test17  test18  test19  test20
John    2008    1   0   0   0   0   1   0   0   0   0   0   1   0   0   1   0   0   1   0   0
John    2008    1   0   1   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0
John    2009    0   1   1   0   0   0   1   0   1   0   0   1   0   0   0   1   0   0   0   0
John    2010    0   0   0   1   0   1   1   0   0   0   1   0   1   0   0   0   1   0   0   1
John    2010    0   0   0   0   0   0   0   1   1   0   0   0   0   1   0   0   1   0   1   1
John    2010    0   0   0   0   0   0   0   0   0   0   0   1   1   0   0   0   0   0   0   0
John    2011    0   0   0   1   1   0   1   0   0   1   0   0   0   0   0   0   0   0   0   0
John    2011    0   0   0   0   1   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
John    2012    0   0   0   1   0   0   1   0   1   0   0   1   0   0   0   0   0   0   0   0
John    2012    0   0   0   0   0   0   0   1   0   0   1   0   0   1   0   0   0   0   0   0
John    2012    0   0   1   0   0   0   1   0   0   0   0   0   0   0   0   1   1   0   0   1
John    2013    0   0   1   0   0   0   0   0   0   0   0   0   0   1   1   0   0   0   0   0
Mary    2009    0   0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
Mary    2010    0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0
Mary    2010    0   0   0   0   1   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
Mary    2011    0   1   0   0   0   0   0   0   1   0   0   0   0   0   0   0   1   0   0   1
Mary    2011    0   0   0   0   0   0   0   1   0   0   1   0   0   0   0   1   0   0   0   0
Mary    2011    0   0   1   1   0   0   1   0   0   0   0   0   1   0   1   0   0   0   0   0
Mary    2011    0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
Mary    2012    0   0   0   0   0   1   0   1   0   0   1   0   1   0   0   0   0   0   0   0
Mary    2012    0   0   0   0   1   0   0   0   0   1   0   1   0   0   0   0   0   0   0   0
Mary    2013    0   0   0   1   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
Mary    2013    0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0
Jack    2010    0   0   0   0   1   0   0   0   0   0   1   0   0   1   1   0   0   0   0   0
Jack    2010    0   0   0   0   0   0   0   0   0   1   0   0   0   0   1   0   0   0   0   0
Jack    2011    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0
Jack    2011    0   0   0   0   1   0   0   0   1   0   0   0   0   1   0   0   0   0   0   0
Jack    2011    0   0   0   0   0   1   0   0   0   1   0   0   0   0   0   0   0   0   0   0
Jack    2011    0   1   0   0   0   0   0   0   0   0   1   0   0   1   0   0   0   0   0   0
Jack    2012    0   0   1   1   0   0   0   0   1   1   0   0   1   0   0   0   0   0   0   0
Jack    2012    0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0
Jack    2013    1   0   0   0   0   1   0   0   0   0   1   0   0   0   0   0   0   0   0   0
Jack    2013    0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0
Jack    2014    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
Jack    2015    0   0   0   1   0   1   1   0   0   0   1   0   1   0   0   0   1   0   0   1
Jack    2015    0   0   0   0   0   0   0   1   1   0   0   0   0   1   0   0   1   0   1   1
Jack    2015    0   0   0   0   0   0   0   0   0   0   0   1   1   0   0   0   0   0   0   0

test1到test20代表不同类型的考试，1代表参加这个考试的人，0代表he/she不参加'吨。一个人可以参加尽可能多的考试。我想要一个 person-year 级别的聚合来展示此人是否参加过当年的每项考试。如上所述，是否有任何简单的方法可以计算所有 20 个测试的 person-year 级别中的 max？我正在考虑使用 ddply，但仍在纠结是否有更好的方法。

提前致谢！
安妮

Answer 1

添加 tidyr 会有所帮助：

# highlighting your data above
dat <- read.table("clipboard", header = TRUE, stringsAsFactors = FALSE)

library(dplyr)
library(tidyr)

dat %>%
  gather(test, tookit, -Name, -Year) %>%
  group_by(Name, Year, test) %>%
  summarize(times = sum(tookit)) %>%
  ungroup()
# # A tibble: 340 × 4
#     Name  Year   test times
#    <chr> <int>  <chr> <int>
# 1   Jack  2010  test1     0
# 2   Jack  2010 test10     1
# 3   Jack  2010 test11     1
# 4   Jack  2010 test12     0
# 5   Jack  2010 test13     0
# 6   Jack  2010 test14     1
# 7   Jack  2010 test15     2
# 8   Jack  2010 test16     0
# 9   Jack  2010 test17     0
# 10  Jack  2010 test18     0
# # ... with 330 more rows

这会告诉您他们每年参加每项考试的次数。

另一种方法（没有tidyr）：

dat %>%
  group_by(Name, Year) %>%
  summarize_at(starts_with("test", vars=colnames(.)), sum) %>%
  ungroup()
# A tibble: 17 × 22
#     Name  Year test1 test2 test3 test4 test5 test6 test7 test8 test9 test10
#    <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>  <int>
# 1   Jack  2010     0     0     0     0     1     0     0     0     0      1
# 2   Jack  2011     0     1     0     0     1     1     0     0     1      1
# 3   Jack  2012     0     0     1     1     0     0     0     0     1      1
# 4   Jack  2013     1     0     0     0     0     1     0     0     0      0
# 5   Jack  2014     0     0     0     0     0     0     0     0     0      0
# 6   Jack  2015     0     0     0     1     0     1     1     1     1      0
# 7   John  2008     2     0     1     0     0     1     0     0     0      1
# 8   John  2009     0     1     1     0     0     0     1     0     1      0
# 9   John  2010     0     0     0     1     0     1     1     1     1      0
# 10  John  2011     0     0     0     1     2     0     1     1     0      1
# 11  John  2012     0     0     1     1     0     0     2     1     1      0
# 12  John  2013     0     0     1     0     0     0     0     0     0      0
# 13  Mary  2009     0     0     1     0     1     0     0     0     0      0
# 14  Mary  2010     0     0     0     0     1     0     1     0     0      1
# 15  Mary  2011     0     1     1     1     0     0     1     1     1      1
# 16  Mary  2012     0     0     0     0     1     1     0     1     0      1
# 17  Mary  2013     0     0     0     1     0     0     1     1     0      0
# # ... with 10 more variables: test11 <int>, test12 <int>, test13 <int>,
# #   test14 <int>, test15 <int>, test16 <int>, test17 <int>, test18 <int>,
# #   test19 <int>, test20 <int>

请参阅组计算中的变量列表

Refer to a list of variables within group calculation

variables

group-by

r

plyr

dplyr