如何将数据框与某些列的某些子级别拆分并在 tidyverse 中应用模型
how to split dataframe with some sub-levels of certain columns and apply model in tidyverse
我以钻石数据集为例。我可以按剪切和颜色拆分数据集,然后应用模型并提取 r 方,如下所示。
diamonds %>% group_by(cut, color) %>%
do(model=lm(price~carat, data=.)) %>%
mutate(r2 = summary(model)$adj.r.squared) %>%
select(-model)
问题是我是否只想按裁剪和颜色的某些子级别对数据进行分组。例如:
cut_sub<- as.factor(c('Good','Fair'))
color_sub <- as.factor(c('E', 'J'))
我应该如何修改上面的代码来实现?我试过并用谷歌搜索但找不到解决方案。
purrr
方法如下:
diamonds %>%
filter(cut %in% c("Fair", "Good"),
color %in% c("E", "J")) %>%
slice_rows(c("cut", "color")) %>%
by_slice(function(.x) {
lm(price~carat, data = .x) %>%
summary %>%
.$adj.r.squared
}, .to = "r2") %>%
unnest(r2)
这是一个使用 purrr
devel v0.2.2.9000
的想法
diamonds %>%
filter(cut %in% c("Fair", "Good"),
color %in% c("E", "J")) %>%
group_by(cut, color) %>%
nest() %>%
mutate(model = map(data, .f = ~lm(price ~ carat, data = .)) %>%
map(summary) %>% map_dbl("adj.r.squared"))
给出:
## A tibble: 4 x 4
# cut color data model
# <ord> <ord> <list> <dbl>
#1 Good E <tibble [933 x 8]> 0.8298957
#2 Good J <tibble [307 x 8]> 0.9176254
#3 Fair E <tibble [224 x 8]> 0.8092058
#4 Fair J <tibble [119 x 8]> 0.7567011
我以钻石数据集为例。我可以按剪切和颜色拆分数据集,然后应用模型并提取 r 方,如下所示。
diamonds %>% group_by(cut, color) %>%
do(model=lm(price~carat, data=.)) %>%
mutate(r2 = summary(model)$adj.r.squared) %>%
select(-model)
问题是我是否只想按裁剪和颜色的某些子级别对数据进行分组。例如:
cut_sub<- as.factor(c('Good','Fair'))
color_sub <- as.factor(c('E', 'J'))
我应该如何修改上面的代码来实现?我试过并用谷歌搜索但找不到解决方案。
purrr
方法如下:
diamonds %>%
filter(cut %in% c("Fair", "Good"),
color %in% c("E", "J")) %>%
slice_rows(c("cut", "color")) %>%
by_slice(function(.x) {
lm(price~carat, data = .x) %>%
summary %>%
.$adj.r.squared
}, .to = "r2") %>%
unnest(r2)
这是一个使用 purrr
devel v0.2.2.9000
diamonds %>%
filter(cut %in% c("Fair", "Good"),
color %in% c("E", "J")) %>%
group_by(cut, color) %>%
nest() %>%
mutate(model = map(data, .f = ~lm(price ~ carat, data = .)) %>%
map(summary) %>% map_dbl("adj.r.squared"))
给出:
## A tibble: 4 x 4
# cut color data model
# <ord> <ord> <list> <dbl>
#1 Good E <tibble [933 x 8]> 0.8298957
#2 Good J <tibble [307 x 8]> 0.9176254
#3 Fair E <tibble [224 x 8]> 0.8092058
#4 Fair J <tibble [119 x 8]> 0.7567011