请参阅组计算中的变量列表
Refer to a list of variables within group calculation
我想计算一组中每个变量(总共 20 个)的最大值,我想知道是否有更简单的方法来执行计算,而不是使用直接列出所有内容的方法 [= dplyr
中的 11=] 和 group_by
?示例数据如下:
Name Year test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 test11 test12 test13 test14 test15 test16 test17 test18 test19 test20
John 2008 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0
John 2008 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0
John 2009 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0
John 2010 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 1
John 2010 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 1
John 2010 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
John 2011 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
John 2011 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
John 2012 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0
John 2012 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0
John 2012 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1
John 2013 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
Mary 2009 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Mary 2010 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Mary 2010 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Mary 2011 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1
Mary 2011 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0
Mary 2011 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0
Mary 2011 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Mary 2012 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0
Mary 2012 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
Mary 2013 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Mary 2013 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Jack 2010 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0
Jack 2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
Jack 2011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
Jack 2011 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
Jack 2011 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Jack 2011 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
Jack 2012 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0
Jack 2012 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Jack 2013 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
Jack 2013 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Jack 2014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Jack 2015 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 1
Jack 2015 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 1
Jack 2015 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
test1
到test20
代表不同类型的考试,1
代表参加这个考试的人,0
代表he/she不参加'吨。一个人可以参加尽可能多的考试。我想要一个 person-year
级别的聚合来展示此人是否参加过当年的每项考试。如上所述,是否有任何简单的方法可以计算所有 20 个测试的 person-year
级别中的 max
?我正在考虑使用 ddply
,但仍在纠结是否有更好的方法。
提前致谢!
安妮
添加 tidyr
会有所帮助:
# highlighting your data above
dat <- read.table("clipboard", header = TRUE, stringsAsFactors = FALSE)
library(dplyr)
library(tidyr)
dat %>%
gather(test, tookit, -Name, -Year) %>%
group_by(Name, Year, test) %>%
summarize(times = sum(tookit)) %>%
ungroup()
# # A tibble: 340 × 4
# Name Year test times
# <chr> <int> <chr> <int>
# 1 Jack 2010 test1 0
# 2 Jack 2010 test10 1
# 3 Jack 2010 test11 1
# 4 Jack 2010 test12 0
# 5 Jack 2010 test13 0
# 6 Jack 2010 test14 1
# 7 Jack 2010 test15 2
# 8 Jack 2010 test16 0
# 9 Jack 2010 test17 0
# 10 Jack 2010 test18 0
# # ... with 330 more rows
这会告诉您他们每年参加每项考试的次数。
另一种方法(没有tidyr
):
dat %>%
group_by(Name, Year) %>%
summarize_at(starts_with("test", vars=colnames(.)), sum) %>%
ungroup()
# A tibble: 17 × 22
# Name Year test1 test2 test3 test4 test5 test6 test7 test8 test9 test10
# <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 Jack 2010 0 0 0 0 1 0 0 0 0 1
# 2 Jack 2011 0 1 0 0 1 1 0 0 1 1
# 3 Jack 2012 0 0 1 1 0 0 0 0 1 1
# 4 Jack 2013 1 0 0 0 0 1 0 0 0 0
# 5 Jack 2014 0 0 0 0 0 0 0 0 0 0
# 6 Jack 2015 0 0 0 1 0 1 1 1 1 0
# 7 John 2008 2 0 1 0 0 1 0 0 0 1
# 8 John 2009 0 1 1 0 0 0 1 0 1 0
# 9 John 2010 0 0 0 1 0 1 1 1 1 0
# 10 John 2011 0 0 0 1 2 0 1 1 0 1
# 11 John 2012 0 0 1 1 0 0 2 1 1 0
# 12 John 2013 0 0 1 0 0 0 0 0 0 0
# 13 Mary 2009 0 0 1 0 1 0 0 0 0 0
# 14 Mary 2010 0 0 0 0 1 0 1 0 0 1
# 15 Mary 2011 0 1 1 1 0 0 1 1 1 1
# 16 Mary 2012 0 0 0 0 1 1 0 1 0 1
# 17 Mary 2013 0 0 0 1 0 0 1 1 0 0
# # ... with 10 more variables: test11 <int>, test12 <int>, test13 <int>,
# # test14 <int>, test15 <int>, test16 <int>, test17 <int>, test18 <int>,
# # test19 <int>, test20 <int>
我想计算一组中每个变量(总共 20 个)的最大值,我想知道是否有更简单的方法来执行计算,而不是使用直接列出所有内容的方法 [= dplyr
中的 11=] 和 group_by
?示例数据如下:
Name Year test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 test11 test12 test13 test14 test15 test16 test17 test18 test19 test20
John 2008 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0
John 2008 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0
John 2009 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0
John 2010 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 1
John 2010 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 1
John 2010 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
John 2011 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
John 2011 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
John 2012 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0
John 2012 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0
John 2012 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1
John 2013 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
Mary 2009 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Mary 2010 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Mary 2010 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Mary 2011 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1
Mary 2011 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0
Mary 2011 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0
Mary 2011 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Mary 2012 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0
Mary 2012 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
Mary 2013 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Mary 2013 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Jack 2010 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0
Jack 2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
Jack 2011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
Jack 2011 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
Jack 2011 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Jack 2011 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
Jack 2012 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0
Jack 2012 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Jack 2013 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
Jack 2013 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Jack 2014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Jack 2015 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 1
Jack 2015 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 1
Jack 2015 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
test1
到test20
代表不同类型的考试,1
代表参加这个考试的人,0
代表he/she不参加'吨。一个人可以参加尽可能多的考试。我想要一个 person-year
级别的聚合来展示此人是否参加过当年的每项考试。如上所述,是否有任何简单的方法可以计算所有 20 个测试的 person-year
级别中的 max
?我正在考虑使用 ddply
,但仍在纠结是否有更好的方法。
提前致谢!
安妮
添加 tidyr
会有所帮助:
# highlighting your data above
dat <- read.table("clipboard", header = TRUE, stringsAsFactors = FALSE)
library(dplyr)
library(tidyr)
dat %>%
gather(test, tookit, -Name, -Year) %>%
group_by(Name, Year, test) %>%
summarize(times = sum(tookit)) %>%
ungroup()
# # A tibble: 340 × 4
# Name Year test times
# <chr> <int> <chr> <int>
# 1 Jack 2010 test1 0
# 2 Jack 2010 test10 1
# 3 Jack 2010 test11 1
# 4 Jack 2010 test12 0
# 5 Jack 2010 test13 0
# 6 Jack 2010 test14 1
# 7 Jack 2010 test15 2
# 8 Jack 2010 test16 0
# 9 Jack 2010 test17 0
# 10 Jack 2010 test18 0
# # ... with 330 more rows
这会告诉您他们每年参加每项考试的次数。
另一种方法(没有tidyr
):
dat %>%
group_by(Name, Year) %>%
summarize_at(starts_with("test", vars=colnames(.)), sum) %>%
ungroup()
# A tibble: 17 × 22
# Name Year test1 test2 test3 test4 test5 test6 test7 test8 test9 test10
# <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 Jack 2010 0 0 0 0 1 0 0 0 0 1
# 2 Jack 2011 0 1 0 0 1 1 0 0 1 1
# 3 Jack 2012 0 0 1 1 0 0 0 0 1 1
# 4 Jack 2013 1 0 0 0 0 1 0 0 0 0
# 5 Jack 2014 0 0 0 0 0 0 0 0 0 0
# 6 Jack 2015 0 0 0 1 0 1 1 1 1 0
# 7 John 2008 2 0 1 0 0 1 0 0 0 1
# 8 John 2009 0 1 1 0 0 0 1 0 1 0
# 9 John 2010 0 0 0 1 0 1 1 1 1 0
# 10 John 2011 0 0 0 1 2 0 1 1 0 1
# 11 John 2012 0 0 1 1 0 0 2 1 1 0
# 12 John 2013 0 0 1 0 0 0 0 0 0 0
# 13 Mary 2009 0 0 1 0 1 0 0 0 0 0
# 14 Mary 2010 0 0 0 0 1 0 1 0 0 1
# 15 Mary 2011 0 1 1 1 0 0 1 1 1 1
# 16 Mary 2012 0 0 0 0 1 1 0 1 0 1
# 17 Mary 2013 0 0 0 1 0 0 1 1 0 0
# # ... with 10 more variables: test11 <int>, test12 <int>, test13 <int>,
# # test14 <int>, test15 <int>, test16 <int>, test17 <int>, test18 <int>,
# # test19 <int>, test20 <int>