将按字符列分组的多因子列和 Return 结果汇总为 R 中的 "Nested" Table

Question

我卡在这个问题上，理论上和比较相似，但在实践中却不同得让我头疼。

基本上，我有以下 table:

str(X)

 tibble [50,000 x 8] (S3: tbl_df/tbl/data.frame)
 $ V1          : chr [1:50000] "M" "M" "M" "M" "M" "F" "F" "F" "F" "F" ...
 $ V2          : Factor w/ 5 levels "***","**","*",..: 1 3 1 1 1 1 1 1 1 3 ...
 $ V3          : Factor w/ 5 levels "***","**","*",..: 5 5 4 1 1 1 1 1 1 1 ...
 $ V4          : Factor w/ 5 levels "***","**","*",..: 5 5 5 3 NA 5 5 5 5 5 ...
 $ V5          : Factor w/ 5 levels "***","**","*",..: 1 1 1 1 1 5 4 1 1 1 ...
 $ V6          : Factor w/ 5 levels "***","**","*",..: 2 1 3 1 5 2 5 5 5 5 ...
 $ V7          : Factor w/ 5 levels "***","**","*",..: 1 1 1 1 1 5 1 2 2 1 ...
 $ V8          : Factor w/ 5 levels "***","**","*",..: 1 1 1 NA 1 1 1 1 1 1 ...

我正在尝试获取一个 table 来计算每个变量中每个因子水平的实例，并且 returns 以下 table （数值由数字组成）：

V1    V9      V1      V2      V3      V4      V5       V6      V7      V8
M     ***     323     232     44      445     4455     555     555     555
M     **      5555    6446    444     444     4110     899     8       8444
M     *       323     232     44      445     4455     555     555     555
M     .       5555    6446    444     444     4110     899     8       8444
M     ns      323     232     44      445     4455     555     555     555
F     ***     5555    6446    444     444     4110     899     8       8444
F     **      323     232     44      445     4455     555     555     555
F     *       5555    6446    444     444     4110     899     8       8444
F     .       323     232     44      445     4455     555     555     555
F     ns      5555    6446    444     444     4110     899     8       8444

library(dplyr)
library(tidyr)
X %>%
   pivot_longer(cols = -V1, names_to = "Name", values_to = 'V9') %>%
   count(V1, Name, V9) %>%
   pivot_wider(names_from = Name, values_from = n, values_fill = 0)

根据评论中的要求，我再次查看了数据：

dput(head(X,5))

structure(list(V1 = c("M", "M", "M", 
"M", "M"), V2 = structure(c(1L, 
3L, 1L, 1L, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"), 
    V3 = structure(c(5L, 5L, 4L, 1L, 1L), .Label = c("***", 
    "**", "*", ".", "ns"), class = "factor"), V4 = structure(c(5L, 
    5L, 5L, 3L, NA), .Label = c("***", "**", "*", ".", "ns"), class = "factor"), 
    V5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***", 
    "**", "*", ".", "ns"), class = "factor"), V6 = structure(c(2L, 
    1L, 3L, 1L, 5L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"), 
    V7 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***", 
    "**", "*", ".", "ns"), class = "factor"), V8 = structure(c(1L, 
    1L, 1L, NA, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

Answer 1

一个选项是使用 pivot_longer 重塑为 'long' 格式，获取计数并重塑回来

library(dplyr)
library(tidyr)
X %>%
   pivot_longer(cols = -V1, names_to = "Name", values_to = 'V9') %>%
   count(V1, Name, V9) %>%
   pivot_wider(names_from = Name, values_from = n, values_fill = 0)

数据

X <- structure(list(V1 = c("M", "M", "M", 
"M", "M"), V2 = structure(c(1L, 
3L, 1L, 1L, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"), 
    V3 = structure(c(5L, 5L, 4L, 1L, 1L), .Label = c("***", 
    "**", "*", ".", "ns"), class = "factor"), V4 = structure(c(5L, 
    5L, 5L, 3L, NA), .Label = c("***", "**", "*", ".", "ns"), class = "factor"), 
    V5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***", 
    "**", "*", ".", "ns"), class = "factor"), V6 = structure(c(2L, 
    1L, 3L, 1L, 5L), .Label = c("***", "**", "*", ".", "ns"), class = "factor"), 
    V7 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("***", 
    "**", "*", ".", "ns"), class = "factor"), V8 = structure(c(1L, 
    1L, 1L, NA, 1L), .Label = c("***", "**", "*", ".", "ns"), class = "factor")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

将按字符列分组的多因子列和 Return 结果汇总为 R 中的 "Nested" Table

Summarize Multiple Factor Columns Grouped by a Character Column and Return Results as a "Nested" Table in R

aggregate

r

dplyr

tidyverse

数据