获取 R 中多个 variables/columns 的分类因子计数
Get counts of categorical factors across multiple variables/columns in R
我在 R 中有以下最小示例:
testing = data.frame(c("Once a week", "Once a week", "Rarely", "Once a month", "Once a month"), c("Once a month", "Once a month", "Once a week", "Rarely", "Rarely"))
colnames(testing) = c("one", "two")
testing
one two
1 Once a week Once a month
2 Once a week Once a month
3 Rarely Once a week
4 Once a month Rarely
5 Once a month Rarely
我希望最终结果是一个数据框,其中一列包含所有可能的分类因素,其余列是每个 column/variable 的计数,如下所示:
categories one two
Rarely 1 2
Once a month 2 2
Once a week 2 1
我对 R 库没有任何限制,所以这里最简单的(也许 plyr
/dplyr
?)。
谢谢。
您可以使用 tidyr
和 dplyr
包整理您的 table 并使用基础 table
函数
计算类别
testing = data.frame(c("Once a week", "Once a week", "Rarely", "Once a month", "Once a month"), c("Once a month", "Once a month", "Once a week", "Rarely", "Rarely"))
colnames(testing) = c("one", "two")
testing
#> one two
#> 1 Once a week Once a month
#> 2 Once a week Once a month
#> 3 Rarely Once a week
#> 4 Once a month Rarely
#> 5 Once a month Rarely
library(tidyr)
library(dplyr)
testing %>%
gather("type", "categories") %>%
table()
#> categories
#> type Once a month Once a week Rarely
#> one 2 2 1
#> two 2 1 2
# or reorder colum before table
testing %>%
gather("type", "categories") %>%
select(categories, type) %>%
table()
#> type
#> categories one two
#> Once a month 2 2
#> Once a week 2 1
#> Rarely 1 2
这是使用 tidyr::gather
、tidyr::spread
和 dplyr::count
的另一种方法:
library(dplyr)
library(tidyr)
testing %>%
gather(measure, value) %>%
count(measure, value) %>%
spread(measure, n)
# Source: local data frame [3 x 3]
#
# value one two
# (chr) (int) (int)
# 1 Once a month 2 2
# 2 Once a week 2 1
# 3 Rarely 1 2
另外,请参阅有关此主题的fantastic gist。
Table 无需外部包即可工作:
sapply(testing, table)
# one two
#Once a month 2 2
#Once a week 2 1
#Rarely 1 2
我在 R 中有以下最小示例:
testing = data.frame(c("Once a week", "Once a week", "Rarely", "Once a month", "Once a month"), c("Once a month", "Once a month", "Once a week", "Rarely", "Rarely"))
colnames(testing) = c("one", "two")
testing
one two
1 Once a week Once a month
2 Once a week Once a month
3 Rarely Once a week
4 Once a month Rarely
5 Once a month Rarely
我希望最终结果是一个数据框,其中一列包含所有可能的分类因素,其余列是每个 column/variable 的计数,如下所示:
categories one two
Rarely 1 2
Once a month 2 2
Once a week 2 1
我对 R 库没有任何限制,所以这里最简单的(也许 plyr
/dplyr
?)。
谢谢。
您可以使用 tidyr
和 dplyr
包整理您的 table 并使用基础 table
函数
testing = data.frame(c("Once a week", "Once a week", "Rarely", "Once a month", "Once a month"), c("Once a month", "Once a month", "Once a week", "Rarely", "Rarely"))
colnames(testing) = c("one", "two")
testing
#> one two
#> 1 Once a week Once a month
#> 2 Once a week Once a month
#> 3 Rarely Once a week
#> 4 Once a month Rarely
#> 5 Once a month Rarely
library(tidyr)
library(dplyr)
testing %>%
gather("type", "categories") %>%
table()
#> categories
#> type Once a month Once a week Rarely
#> one 2 2 1
#> two 2 1 2
# or reorder colum before table
testing %>%
gather("type", "categories") %>%
select(categories, type) %>%
table()
#> type
#> categories one two
#> Once a month 2 2
#> Once a week 2 1
#> Rarely 1 2
这是使用 tidyr::gather
、tidyr::spread
和 dplyr::count
的另一种方法:
library(dplyr)
library(tidyr)
testing %>%
gather(measure, value) %>%
count(measure, value) %>%
spread(measure, n)
# Source: local data frame [3 x 3]
#
# value one two
# (chr) (int) (int)
# 1 Once a month 2 2
# 2 Once a week 2 1
# 3 Rarely 1 2
另外,请参阅有关此主题的fantastic gist。
Table 无需外部包即可工作:
sapply(testing, table)
# one two
#Once a month 2 2
#Once a week 2 1
#Rarely 1 2