将数据集拆分为嵌套的数据帧列表，然后使用 Tidyr 和 Purrr 传播

Question

library(ggmosaic)
library(tidyverse)

下面是示例代码

happy2<-happy%>%
select(sex,marital,degree,health)%>%
group_by(sex,marital,degree,health)%>%
summarise(Count=n())

以下代码将数据集拆分为一个嵌套列表，其中包含针对学位变量的每个类别的男性和女性（性别变量）表。

happy2 %>% 
split(.$degree) %>% 
lapply(function(x) split(x, x$sex))

这就是我现在挣扎的地方。我想重塑或使用 Tidyr，传播 "marital" 变量，或者也许应该再次拆分，以便 "marital" 的每个类别都是一列 header，每列包含"health"变量和相应的"Count"。可以删除多余的 "sex" 和 "degree" 列。

因为我在使用列表，所以我一直在尝试使用 Tidyverse 方法，例如，我一直在尝试使用 purrr 来删除变量：

happy2%>%map(~select(.x,-sex)

我在想我也可以使用 purrr 进行传播，但我无法做到这一点。

为了帮助说明我在寻找什么，我附上了一张可能结构的图片。我没有包括所有类别并且计数不正确，因为我只显示结构。我想 "marital" 类别也可以是第三个拆分变量，如果这样更容易的话？所以我希望得到的是每个学位类别的男性和女性表格，以及健康婚姻和显示相应的计数。

帮助将不胜感激...

Answer 1

下面的方法行得通吗？我按性别更改了 split 的语法，以便我可以将后续命令链接在一起：

happy2 %>% 
  split(.$degree) %>% 
  lapply(function(x) x %>% split(.$sex) %>%
           lapply(function(x) x %>% select(-sex, -degree) %>%
                    spread(health, Count)))

编辑：

这将为您提供每个婚姻状况的单独 table：

happy2 %>% 
  ungroup() %>%
  split(.$degree) %>% 
  lapply(function(x) x %>% split(.$sex) %>%
           lapply(function(x) x %>% select(-sex, -degree) %>% split(.$marital)))

如果您不希望第一列显示婚姻状况，以下版本会删除：

happy2 %>% 
  ungroup() %>%
  split(.$degree) %>% 
  lapply(function(x) x %>% split(.$sex) %>%
           lapply(function(x) x %>% select(-sex, -degree) %>% split(.$marital) %>%
                    lapply(function(x) x %>% select(-marital))))

Answer 2

这个怎么样：

# cleaned up your code a bit
# removed the select (as it does nothing)
# consistent column names (count is lower case like the rest of the variables)
# added spacing
happy2 <- happy %>%
  group_by(sex, marital, degree, health) %>%
  summarise(count=n())

happy2 %>%
  dplyr::ungroup() %>% 
  split(list(.$degree, .$sex, .$marital)) %>% 
  lapply(. %>% select(health, count))

或者您真的想要 "martial" 状态，因为 "health" 列的标题在您的图片中有 table 吗？

将数据集拆分为嵌套的数据帧列表，然后使用 Tidyr 和 Purrr 传播

Split a Dataset into a Nested List of Dataframes and then Spread Using Tidyr and Purrr

r

purrr

tidyverse