使用 purrr 根据嵌套数据框列中的数据进行过滤

Question

我正在尝试根据嵌套数据框列中的数据过滤数据框的行。考虑以下示例：

library(tidyverse)

df  <- structure(list(id = c(47L, 47L, 45L, 45L, 85L, 85L), src = c("bycity", 
         "indb", "bycity", "indb", "bycity", "indb"), lat = c(42.73856678, 
         NA, 39.40803248, 39.40620766, 42.52458775, NA), lon = c(-85.82890251, 
         -85.654987, -88.47774221, -88.50701219, -87.26410992, -83.647894)), .Names = c("id", 
          "src", "lat", "lon"), row.names = c(NA, -6L), class = c("tbl_df", 
         "tbl", "data.frame")
    ) %>% 
  nest(-id) %>% 
  mutate(
    anothervar = c(0.077537764, NA, 0.029326812)
  )


# only keep the rows where the lat in the indb row is NA
filtereddf  <- df %>% 
   filter(map(data, ~(.x %>% pluck("lat", 2) %>% is.na )) )

# Error in filter_impl(.data, quo) : 
#   Argument 2 filter condition does not evaluate to a logical vector


# desired output would be the two rows where data[[2,2]] is NA
# A tibble: 2 x 3
     id             data anothervar
  <int>           <list>      <dbl>
1    47 <tibble [2 x 3]> 0.07753776
3    85 <tibble [2 x 3]> 0.02932681

我过滤的嵌套数据框具有一致的列名，我总是只想查看第二行。

我想我可以取消嵌套数据框（每个 ID 给我两行，而我之前有一行），然后将内容过滤到满足我的条件的 ID 列表并使用 anti_join()抛出有问题的行，但我更感兴趣的是了解为什么在过滤器中使用 map() 无法按我预期的方式工作。

为什么我会收到此错误以及如何过滤嵌套数据框列？

Answer 1

您想使用 map_lgl()，map() 将 return 一个列表，而 map_lgl() return 是一个逻辑类型的向量。

filtereddf  <- df %>% 
   filter(map_lgl(data, ~(.x %>% pluck("lat", 2) %>% is.na )) )

使用 purrr 根据嵌套数据框列中的数据进行过滤

Filter based on data in a nested data frame column using purrr

r

dplyr

purrr

tidyverse