如何将数据框与某些列的某些子级别拆分并在 tidyverse 中应用模型

how to split dataframe with some sub-levels of certain columns and apply model in tidyverse

我以钻石数据集为例。我可以按剪切和颜色拆分数据集,然后应用模型并提取 r 方,如下所示。

diamonds %>% group_by(cut, color) %>% 
            do(model=lm(price~carat, data=.)) %>%
            mutate(r2 = summary(model)$adj.r.squared) %>% 
            select(-model)

问题是我是否只想按裁剪和颜色的某些子级别对数据进行分组。例如:

cut_sub<- as.factor(c('Good','Fair'))
color_sub <- as.factor(c('E', 'J'))

我应该如何修改上面的代码来实现?我试过并用谷歌搜索但找不到解决方案。

purrr 方法如下:

diamonds %>% 
  filter(cut %in% c("Fair", "Good"), 
         color %in% c("E", "J")) %>% 
  slice_rows(c("cut", "color")) %>% 
  by_slice(function(.x) {
    lm(price~carat, data = .x) %>% 
      summary %>% 
      .$adj.r.squared
  }, .to = "r2") %>% 
  unnest(r2)

这是一个使用 purrr devel v0.2.2.9000

的想法
diamonds %>% 
  filter(cut %in% c("Fair", "Good"), 
         color %in% c("E", "J")) %>% 
  group_by(cut, color) %>%
  nest() %>%
  mutate(model = map(data, .f = ~lm(price ~ carat, data = .)) %>% 
           map(summary) %>% map_dbl("adj.r.squared"))

给出:

## A tibble: 4 x 4
#    cut color               data     model
#  <ord> <ord>             <list>     <dbl>
#1  Good     E <tibble [933 x 8]> 0.8298957
#2  Good     J <tibble [307 x 8]> 0.9176254
#3  Fair     E <tibble [224 x 8]> 0.8092058
#4  Fair     J <tibble [119 x 8]> 0.7567011