使用 dplyr 和 tidyr 将 R 中的多变量数据转换为聚合 table
Transforming multivariate data in R to an aggregated table using dplyr and tidyr
我正在使用 dplyr 和 tidyr 聚合和总结一些多元数据。我如何以类似 table 的形式呈现数据,如下所示?
数据集:
year, division, group, count
2016, utensils, forks, 10
2016, utensils, spoons, 5
2016, utensils, knives, 20
2015, utensils, spoons, 4
2015, utensils, knives, 15
2015, utensils, forks, 11
2016, tools, hammer, 10
2016, tools, wrench, 5
2016, tools, awe, 20
2015, tools, hammer, 4
2015, tools, wrench, 15
2015, tools, awe 11
我想提供这样的信息:
2016 2015
Utensils Utensils
Forks count count
Spoons count count
Knives count count
2016 2015
Tools Tools
Hammer count count
Wrench count count
Awe count count
你可以看看这个。基本上这是一个重塑问题,但你需要先通过 division 列拆分你的数据框,然后使用 dcast 来转换每个子集:
library(reshape2)
lapply(split(df, df$division), function(s) dcast(group ~ year + division, data = s, value.var = "count"))
#$tools
# group 2015_tools 2016_tools
#1 awe 11 20
#2 hammer 4 10
#3 wrench 15 5
#$utensils
# group 2015_utensils 2016_utensils
#1 forks 11 10
#2 kinves 15 20
#3 spoons 4 5
或者由于每个子数据框仅包含一个唯一的分区,您可以将其从列名中删除而不添加 dcast 公式,因为它不会添加额外信息:
lapply(split(df, df$division), function(s) dcast(group ~ year, data = s, value.var = "count"))
#$tools
# group 2015 2016
#1 awe 11 20
#2 hammer 4 10
#3 wrench 15 5
#$utensils
# group 2015 2016
#1 forks 11 10
#2 kinves 15 20
#3 spoons 4 5
我正在使用 dplyr 和 tidyr 聚合和总结一些多元数据。我如何以类似 table 的形式呈现数据,如下所示?
数据集:
year, division, group, count
2016, utensils, forks, 10
2016, utensils, spoons, 5
2016, utensils, knives, 20
2015, utensils, spoons, 4
2015, utensils, knives, 15
2015, utensils, forks, 11
2016, tools, hammer, 10
2016, tools, wrench, 5
2016, tools, awe, 20
2015, tools, hammer, 4
2015, tools, wrench, 15
2015, tools, awe 11
我想提供这样的信息:
2016 2015
Utensils Utensils
Forks count count
Spoons count count
Knives count count
2016 2015
Tools Tools
Hammer count count
Wrench count count
Awe count count
你可以看看这个。基本上这是一个重塑问题,但你需要先通过 division 列拆分你的数据框,然后使用 dcast 来转换每个子集:
library(reshape2)
lapply(split(df, df$division), function(s) dcast(group ~ year + division, data = s, value.var = "count"))
#$tools
# group 2015_tools 2016_tools
#1 awe 11 20
#2 hammer 4 10
#3 wrench 15 5
#$utensils
# group 2015_utensils 2016_utensils
#1 forks 11 10
#2 kinves 15 20
#3 spoons 4 5
或者由于每个子数据框仅包含一个唯一的分区,您可以将其从列名中删除而不添加 dcast 公式,因为它不会添加额外信息:
lapply(split(df, df$division), function(s) dcast(group ~ year, data = s, value.var = "count"))
#$tools
# group 2015 2016
#1 awe 11 20
#2 hammer 4 10
#3 wrench 15 5
#$utensils
# group 2015 2016
#1 forks 11 10
#2 kinves 15 20
#3 spoons 4 5