将按字符列分组的多因子列和 Return 结果汇总为 R 中的 "Nested" Table
Summarize Multiple Factor Columns Grouped by a Character Column and Return Results as a "Nested" Table in R
我卡在这个问题上,理论上和比较相似,但在实践中却不同得让我头疼。
基本上,我有以下 table:
str(X)
tibble [50,000 x 8] (S3: tbl_df/tbl/data.frame)
$ V1 : chr [1:50000] "M" "M" "M" "M" "M" "F" "F" "F" "F" "F" ...
$ V2 : Factor w/ 5 levels "***","**","*",..: 1 3 1 1 1 1 1 1 1 3 ...
$ V3 : Factor w/ 5 levels "***","**","*",..: 5 5 4 1 1 1 1 1 1 1 ...
$ V4 : Factor w/ 5 levels "***","**","*",..: 5 5 5 3 NA 5 5 5 5 5 ...
$ V5 : Factor w/ 5 levels "***","**","*",..: 1 1 1 1 1 5 4 1 1 1 ...
$ V6 : Factor w/ 5 levels "***","**","*",..: 2 1 3 1 5 2 5 5 5 5 ...
$ V7 : Factor w/ 5 levels "***","**","*",..: 1 1 1 1 1 5 1 2 2 1 ...
$ V8 : Factor w/ 5 levels "***","**","*",..: 1 1 1 NA 1 1 1 1 1 1 ...
我正在尝试获取一个 table 来计算每个变量中每个因子水平的实例,并且 returns 以下 table (数值由数字组成):
V1 V9 V1 V2 V3 V4 V5 V6 V7 V8
M *** 323 232 44 445 4455 555 555 555
M ** 5555 6446 444 444 4110 899 8 8444
M * 323 232 44 445 4455 555 555 555
M . 5555 6446 444 444 4110 899 8 8444
M ns 323 232 44 445 4455 555 555 555
F *** 5555 6446 444 444 4110 899 8 8444
F ** 323 232 44 445 4455 555 555 555
F * 5555 6446 444 444 4110 899 8 8444
F . 323 232 44 445 4455 555 555 555
F ns 5555 6446 444 444 4110 899 8 8444
library(dplyr)
library(tidyr)
X %>%
pivot_longer(cols = -V1, names_to = "Name", values_to = 'V9') %>%
count(V1, Name, V9) %>%
pivot_wider(names_from = Name, values_from = n, values_fill = 0)
根据评论中的要求,我再次查看了数据:
dput(head(X,5))
structure(list(V1 = c("M", "M", "M",
"M", "M"), V2 = structure(c(1L,
3L, 1L, 1L, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V3 = structure(c(5L, 5L, 4L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V4 = structure(c(5L,
5L, 5L, 3L, NA), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V6 = structure(c(2L,
1L, 3L, 1L, 5L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V7 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V8 = structure(c(1L,
1L, 1L, NA, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
一个选项是使用 pivot_longer
重塑为 'long' 格式,获取计数并重塑回来
library(dplyr)
library(tidyr)
X %>%
pivot_longer(cols = -V1, names_to = "Name", values_to = 'V9') %>%
count(V1, Name, V9) %>%
pivot_wider(names_from = Name, values_from = n, values_fill = 0)
数据
X <- structure(list(V1 = c("M", "M", "M",
"M", "M"), V2 = structure(c(1L,
3L, 1L, 1L, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V3 = structure(c(5L, 5L, 4L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V4 = structure(c(5L,
5L, 5L, 3L, NA), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V6 = structure(c(2L,
1L, 3L, 1L, 5L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V7 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V8 = structure(c(1L,
1L, 1L, NA, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
我卡在这个问题上,理论上和
基本上,我有以下 table:
str(X)
tibble [50,000 x 8] (S3: tbl_df/tbl/data.frame)
$ V1 : chr [1:50000] "M" "M" "M" "M" "M" "F" "F" "F" "F" "F" ...
$ V2 : Factor w/ 5 levels "***","**","*",..: 1 3 1 1 1 1 1 1 1 3 ...
$ V3 : Factor w/ 5 levels "***","**","*",..: 5 5 4 1 1 1 1 1 1 1 ...
$ V4 : Factor w/ 5 levels "***","**","*",..: 5 5 5 3 NA 5 5 5 5 5 ...
$ V5 : Factor w/ 5 levels "***","**","*",..: 1 1 1 1 1 5 4 1 1 1 ...
$ V6 : Factor w/ 5 levels "***","**","*",..: 2 1 3 1 5 2 5 5 5 5 ...
$ V7 : Factor w/ 5 levels "***","**","*",..: 1 1 1 1 1 5 1 2 2 1 ...
$ V8 : Factor w/ 5 levels "***","**","*",..: 1 1 1 NA 1 1 1 1 1 1 ...
我正在尝试获取一个 table 来计算每个变量中每个因子水平的实例,并且 returns 以下 table (数值由数字组成):
V1 V9 V1 V2 V3 V4 V5 V6 V7 V8
M *** 323 232 44 445 4455 555 555 555
M ** 5555 6446 444 444 4110 899 8 8444
M * 323 232 44 445 4455 555 555 555
M . 5555 6446 444 444 4110 899 8 8444
M ns 323 232 44 445 4455 555 555 555
F *** 5555 6446 444 444 4110 899 8 8444
F ** 323 232 44 445 4455 555 555 555
F * 5555 6446 444 444 4110 899 8 8444
F . 323 232 44 445 4455 555 555 555
F ns 5555 6446 444 444 4110 899 8 8444
library(dplyr)
library(tidyr)
X %>%
pivot_longer(cols = -V1, names_to = "Name", values_to = 'V9') %>%
count(V1, Name, V9) %>%
pivot_wider(names_from = Name, values_from = n, values_fill = 0)
根据评论中的要求,我再次查看了数据:
dput(head(X,5))
structure(list(V1 = c("M", "M", "M",
"M", "M"), V2 = structure(c(1L,
3L, 1L, 1L, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V3 = structure(c(5L, 5L, 4L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V4 = structure(c(5L,
5L, 5L, 3L, NA), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V6 = structure(c(2L,
1L, 3L, 1L, 5L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V7 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V8 = structure(c(1L,
1L, 1L, NA, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
一个选项是使用 pivot_longer
重塑为 'long' 格式,获取计数并重塑回来
library(dplyr)
library(tidyr)
X %>%
pivot_longer(cols = -V1, names_to = "Name", values_to = 'V9') %>%
count(V1, Name, V9) %>%
pivot_wider(names_from = Name, values_from = n, values_fill = 0)
数据
X <- structure(list(V1 = c("M", "M", "M",
"M", "M"), V2 = structure(c(1L,
3L, 1L, 1L, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V3 = structure(c(5L, 5L, 4L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V4 = structure(c(5L,
5L, 5L, 3L, NA), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V6 = structure(c(2L,
1L, 3L, 1L, 5L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"),
V7 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***",
"**", "*", ".", "ns"), class = "factor"), V8 = structure(c(1L,
1L, 1L, NA, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))