Table R 中分组变量的分类变量
Table of categorical variables by a grouping variable in R
我有一个包含一些分类变量 + 一个 "cluster" 变量的数据集。例如:
time <- c("Morning", "Evening" ,"Morning", "Morning", "Afternoon", "Evening", "Afternoon")
dollar <- c("1-5", "6-10", "11-15", "1-5", "1-5", "6-10", "6-10")
with_kids <- c("no", "yes", "yes", "no", "no", "yes", "yes")
cluster <- c(1,1,2,3,2,2,3)
data <- cbind(time, dollar, with_kids, cluster)
如何通过 "cluster" 创建所有分类变量的频率 table?
所需的输出是右侧的 table(每个聚类中每个分类变量的列 %)。
我知道这段代码适用于一个变量。如果我有更多分类变量,最有效的方法是什么?
table(data$time, data$cluster)
time <- c("Morning", "Evening" ,"Morning", "Morning", "Afternoon", "Evening", "Afternoon")
dollar <- c("1-5", "6-10", "11-15", "1-5", "1-5", "6-10", "6-10")
with_kids <- c("no", "yes", "yes", "no", "no", "yes", "yes")
cluster <- c(1,1,2,3,2,2,3)
data <- data.frame(time, dollar, with_kids, cluster)
您可以使用 dplyr
包和 select 任意多的变量
library(dplyr)
data %>%
group_by(interaction(time, cluster, dollar)) %>%
summarise(count = n())
# A tibble: 7 x 2
`interaction(time, cluster, dollar)` count
<fctr> <int>
1 Morning.1.1-5 1
2 Afternoon.2.1-5 1
3 Morning.3.1-5 1
4 Morning.2.11-15 1
5 Evening.1.6-10 1
6 Evening.2.6-10 1
7 Afternoon.3.6-10 1
我不完全确定你想要的输出,但有两种可能性。
表格列表:
myList <- lapply(dat[head(names(dat), -1)], table, dat$cluster)
myList
$time
1 2 3
Afternoon 0 1 1
Evening 1 1 0
Morning 1 1 1
$dollar
1 2 3
1-5 1 1 1
11-15 0 1 0
6-10 1 1 1
$with_kids
1 2 3
no 1 1 1
yes 1 2 1
要获取比例表列表,您可以使用 prop.table
作为函数 lapply
您的表格列表并输入它 margin=2
:
lapply(myList, prop.table, margin=2)
$time
1 2 3
Afternoon 0.0000000 0.3333333 0.5000000
Evening 0.5000000 0.3333333 0.0000000
Morning 0.5000000 0.3333333 0.5000000
$dollar
1 2 3
1-5 0.5000000 0.3333333 0.5000000
11-15 0.0000000 0.3333333 0.0000000
6-10 0.5000000 0.3333333 0.5000000
$with_kids
1 2 3
no 0.5000000 0.3333333 0.5000000
yes 0.5000000 0.6666667 0.5000000
将它们绑定在一起
do.call(rbind, lapply(dat[head(names(dat), -1)], table, dat$cluster))
1 2 3
Afternoon 0 1 1
Evening 1 1 0
Morning 1 1 1
1-5 1 1 1
11-15 0 1 0
6-10 1 1 1
no 1 1 1
yes 1 2 1
数据
dat <-
structure(list(time = structure(c(3L, 2L, 3L, 3L, 1L, 2L, 1L), .Label = c("Afternoon",
"Evening", "Morning"), class = "factor"), dollar = structure(c(1L,
3L, 2L, 1L, 1L, 3L, 3L), .Label = c("1-5", "11-15", "6-10"), class = "factor"),
with_kids = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("no",
"yes"), class = "factor"), cluster = c(1, 1, 2, 3, 2, 2,
3)), .Names = c("time", "dollar", "with_kids", "cluster"), row.names = c(NA,
-7L), class = "data.frame")
我有一个包含一些分类变量 + 一个 "cluster" 变量的数据集。例如:
time <- c("Morning", "Evening" ,"Morning", "Morning", "Afternoon", "Evening", "Afternoon")
dollar <- c("1-5", "6-10", "11-15", "1-5", "1-5", "6-10", "6-10")
with_kids <- c("no", "yes", "yes", "no", "no", "yes", "yes")
cluster <- c(1,1,2,3,2,2,3)
data <- cbind(time, dollar, with_kids, cluster)
如何通过 "cluster" 创建所有分类变量的频率 table?
所需的输出是右侧的 table(每个聚类中每个分类变量的列 %)。
我知道这段代码适用于一个变量。如果我有更多分类变量,最有效的方法是什么?
table(data$time, data$cluster)
time <- c("Morning", "Evening" ,"Morning", "Morning", "Afternoon", "Evening", "Afternoon")
dollar <- c("1-5", "6-10", "11-15", "1-5", "1-5", "6-10", "6-10")
with_kids <- c("no", "yes", "yes", "no", "no", "yes", "yes")
cluster <- c(1,1,2,3,2,2,3)
data <- data.frame(time, dollar, with_kids, cluster)
您可以使用 dplyr
包和 select 任意多的变量
library(dplyr)
data %>%
group_by(interaction(time, cluster, dollar)) %>%
summarise(count = n())
# A tibble: 7 x 2
`interaction(time, cluster, dollar)` count
<fctr> <int>
1 Morning.1.1-5 1
2 Afternoon.2.1-5 1
3 Morning.3.1-5 1
4 Morning.2.11-15 1
5 Evening.1.6-10 1
6 Evening.2.6-10 1
7 Afternoon.3.6-10 1
我不完全确定你想要的输出,但有两种可能性。
表格列表:
myList <- lapply(dat[head(names(dat), -1)], table, dat$cluster)
myList
$time
1 2 3
Afternoon 0 1 1
Evening 1 1 0
Morning 1 1 1
$dollar
1 2 3
1-5 1 1 1
11-15 0 1 0
6-10 1 1 1
$with_kids
1 2 3
no 1 1 1
yes 1 2 1
要获取比例表列表,您可以使用 prop.table
作为函数 lapply
您的表格列表并输入它 margin=2
:
lapply(myList, prop.table, margin=2)
$time
1 2 3
Afternoon 0.0000000 0.3333333 0.5000000
Evening 0.5000000 0.3333333 0.0000000
Morning 0.5000000 0.3333333 0.5000000
$dollar
1 2 3
1-5 0.5000000 0.3333333 0.5000000
11-15 0.0000000 0.3333333 0.0000000
6-10 0.5000000 0.3333333 0.5000000
$with_kids
1 2 3
no 0.5000000 0.3333333 0.5000000
yes 0.5000000 0.6666667 0.5000000
将它们绑定在一起
do.call(rbind, lapply(dat[head(names(dat), -1)], table, dat$cluster))
1 2 3
Afternoon 0 1 1
Evening 1 1 0
Morning 1 1 1
1-5 1 1 1
11-15 0 1 0
6-10 1 1 1
no 1 1 1
yes 1 2 1
数据
dat <-
structure(list(time = structure(c(3L, 2L, 3L, 3L, 1L, 2L, 1L), .Label = c("Afternoon",
"Evening", "Morning"), class = "factor"), dollar = structure(c(1L,
3L, 2L, 1L, 1L, 3L, 3L), .Label = c("1-5", "11-15", "6-10"), class = "factor"),
with_kids = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("no",
"yes"), class = "factor"), cluster = c(1, 1, 2, 3, 2, 2,
3)), .Names = c("time", "dollar", "with_kids", "cluster"), row.names = c(NA,
-7L), class = "data.frame")