使用 purrr 遍历两个列表,然后通过数据框列表将 dplyr::filter 传送到 dplyr::filter
Using purrr to iterate over two lists and then pipe into dplyr::filter across a list of data frames
library(tidyverse)
library(purrr)
这是问题的延续:"Using Purrr to Iterate Over Two Lists and Then Pipe into Dplyr::Filter"。
使用下面的示例数据,我首先创建了一个数据框 (wanted
),其中包含我要提供给 dplyr::filter
的值。然后我使用下面的代码创建结果的数据框。
map2_dfr(wanted$School, wanted$Code, ~filter(DF, School == .x, Code == .y)) %>%
group_by(School, Code) %>%
summarise_all(sum)
然而,我的实际数据跨越三个不同的数据集,来自三个不同的时间段。对于这个例子,我只是额外复制了两份DF,然后将它们放入一个列表
DF2 <- DF
DF3 <- DF
DFList <- list(DF, DF2, DF3)
现在,为了处理列表中的每个数据框,我必须使用 purrr:::map
和类似下面的代码...
DFList %>%
map(~filter(.x, School == "School1", Code == "B344")) %>%
map(~group_by(.x, School, Code)) %>%
map(~summarise(.x, Count = sum(Question1)))
这就是我卡住的地方。我想结合上面的两种方法来迭代 wanted
,将这些值输入 dplyr::filter
,但现在我必须跨数据框列表执行此操作并输出三个数据框的列表.
我正在为类似下面的代码而苦苦挣扎...它不起作用。有什么建议么?使用这么多 maps
似乎也不是最好的方法...
map2_dfr(Wanted$School, Wanted$Code,
~DFList %>%
map(~filter(.x, School == .x, Code == .y) %>%
map(~group_by(.x, Code, School) %>%
map(~summarise(.x, Count = sum(Question1))))))
示例数据:
Code <- c("B344","B555","S300","T220","B888","B888","B555","B344","B344","T220","B555","B555","S300","B555","S300","S300","S300","S300","B344","B344","B888","B888","B888")
School <- c("School1","School1","School2","School3","School4","School4","School1","School1","School3","School3","School4","School1","School1","School3","School2","School2"," School4","School2","School3","School4","School3","School1","School2")
Question1 <- c(3,4,5,4,5,5,5,4,5,3,4,5,4,5,4,3,3,3,4,5,4,3,3)
Question2 <- c(5,4,3,4,3,5,4,3,2,3,4,5,4,5,4,3,4,4,5,4,3,3,4)
DF <- data_frame(Code, School, Question1, Question2)
wanted <- data_frame(School = c("School2", "School1"),
Code = c("S300", "B344"))
由于列表中的数据框格式相同,只需使用dplyr::bind_rows
将它们强制转换为单个数据框,通过传递.id
参数保存元素名称,可以是用于过滤后的分组,加入wanted
:
library(tidyverse)
DF <- data_frame(Code = c("B344", "B555", "S300", "T220", "B888", "B888", "B555", "B344", "B344", "T220", "B555", "B555", "S300", "B555", "S300", "S300", "S300", "S300", "B344", "B344", "B888", "B888", "B888"),
School = c("School1", "School1", "School2", "School3", "School4", "School4", "School1", "School1", "School3", "School3", "School4", "School1", "School1", "School3", "School2", "School2", "School4", "School2", "School3", "School4", "School3", "School1", "School2"),
Question1 = c(3, 4, 5, 4, 5, 5, 5, 4, 5, 3, 4, 5, 4, 5, 4, 3, 3, 3, 4, 5, 4, 3, 3),
Question2 = c(5, 4, 3, 4, 3, 5, 4, 3, 2, 3, 4, 5, 4, 5, 4, 3, 4, 4, 5, 4, 3, 3, 4))
wanted <- data_frame(School = c("School2", "School1"),
Code = c("S300", "B344"))
DFList <- list(DF, DF, DF)
DFList %>%
bind_rows(.id = 'id') %>%
inner_join(wanted) %>%
group_by(id, School, Code) %>%
summarise_all(sum)
#> Joining, by = c("Code", "School")
#> # A tibble: 6 x 5
#> # Groups: id, School [?]
#> id School Code Question1 Question2
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 1 School1 B344 7.00 8.00
#> 2 1 School2 S300 15.0 14.0
#> 3 2 School1 B344 7.00 8.00
#> 4 2 School2 S300 15.0 14.0
#> 5 3 School1 B344 7.00 8.00
#> 6 3 School2 S300 15.0 14.0
library(tidyverse)
library(purrr)
这是问题的延续:"Using Purrr to Iterate Over Two Lists and Then Pipe into Dplyr::Filter"。
使用下面的示例数据,我首先创建了一个数据框 (wanted
),其中包含我要提供给 dplyr::filter
的值。然后我使用下面的代码创建结果的数据框。
map2_dfr(wanted$School, wanted$Code, ~filter(DF, School == .x, Code == .y)) %>%
group_by(School, Code) %>%
summarise_all(sum)
然而,我的实际数据跨越三个不同的数据集,来自三个不同的时间段。对于这个例子,我只是额外复制了两份DF,然后将它们放入一个列表
DF2 <- DF
DF3 <- DF
DFList <- list(DF, DF2, DF3)
现在,为了处理列表中的每个数据框,我必须使用 purrr:::map
和类似下面的代码...
DFList %>%
map(~filter(.x, School == "School1", Code == "B344")) %>%
map(~group_by(.x, School, Code)) %>%
map(~summarise(.x, Count = sum(Question1)))
这就是我卡住的地方。我想结合上面的两种方法来迭代 wanted
,将这些值输入 dplyr::filter
,但现在我必须跨数据框列表执行此操作并输出三个数据框的列表.
我正在为类似下面的代码而苦苦挣扎...它不起作用。有什么建议么?使用这么多 maps
似乎也不是最好的方法...
map2_dfr(Wanted$School, Wanted$Code,
~DFList %>%
map(~filter(.x, School == .x, Code == .y) %>%
map(~group_by(.x, Code, School) %>%
map(~summarise(.x, Count = sum(Question1))))))
示例数据:
Code <- c("B344","B555","S300","T220","B888","B888","B555","B344","B344","T220","B555","B555","S300","B555","S300","S300","S300","S300","B344","B344","B888","B888","B888")
School <- c("School1","School1","School2","School3","School4","School4","School1","School1","School3","School3","School4","School1","School1","School3","School2","School2"," School4","School2","School3","School4","School3","School1","School2")
Question1 <- c(3,4,5,4,5,5,5,4,5,3,4,5,4,5,4,3,3,3,4,5,4,3,3)
Question2 <- c(5,4,3,4,3,5,4,3,2,3,4,5,4,5,4,3,4,4,5,4,3,3,4)
DF <- data_frame(Code, School, Question1, Question2)
wanted <- data_frame(School = c("School2", "School1"),
Code = c("S300", "B344"))
由于列表中的数据框格式相同,只需使用dplyr::bind_rows
将它们强制转换为单个数据框,通过传递.id
参数保存元素名称,可以是用于过滤后的分组,加入wanted
:
library(tidyverse)
DF <- data_frame(Code = c("B344", "B555", "S300", "T220", "B888", "B888", "B555", "B344", "B344", "T220", "B555", "B555", "S300", "B555", "S300", "S300", "S300", "S300", "B344", "B344", "B888", "B888", "B888"),
School = c("School1", "School1", "School2", "School3", "School4", "School4", "School1", "School1", "School3", "School3", "School4", "School1", "School1", "School3", "School2", "School2", "School4", "School2", "School3", "School4", "School3", "School1", "School2"),
Question1 = c(3, 4, 5, 4, 5, 5, 5, 4, 5, 3, 4, 5, 4, 5, 4, 3, 3, 3, 4, 5, 4, 3, 3),
Question2 = c(5, 4, 3, 4, 3, 5, 4, 3, 2, 3, 4, 5, 4, 5, 4, 3, 4, 4, 5, 4, 3, 3, 4))
wanted <- data_frame(School = c("School2", "School1"),
Code = c("S300", "B344"))
DFList <- list(DF, DF, DF)
DFList %>%
bind_rows(.id = 'id') %>%
inner_join(wanted) %>%
group_by(id, School, Code) %>%
summarise_all(sum)
#> Joining, by = c("Code", "School")
#> # A tibble: 6 x 5
#> # Groups: id, School [?]
#> id School Code Question1 Question2
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 1 School1 B344 7.00 8.00
#> 2 1 School2 S300 15.0 14.0
#> 3 2 School1 B344 7.00 8.00
#> 4 2 School2 S300 15.0 14.0
#> 5 3 School1 B344 7.00 8.00
#> 6 3 School2 S300 15.0 14.0