根据外部值有条件地应用流水线步骤

Conditionally apply pipeline step depending on external value

鉴于 dplyr 工作流程:

require(dplyr)                                      
mtcars %>% 
    tibble::rownames_to_column(var = "model") %>% 
    filter(grepl(x = model, pattern = "Merc")) %>% 
    group_by(am) %>% 
    summarise(meanMPG = mean(mpg))

我有兴趣根据 applyFilter 的值有条件地应用 filter

解决方案

对于 applyFilter <- 1,使用 "Merc" 字符串过滤行,不使用过滤器 返回所有 行。

applyFilter <- 1


mtcars %>%
  tibble::rownames_to_column(var = "model") %>%
  filter(model %in%
           if (applyFilter) {
             rownames(mtcars)[grepl(x = rownames(mtcars), pattern = "Merc")]
           } else
           {
             rownames(mtcars)
           }) %>%
  group_by(am) %>%
  summarise(meanMPG = mean(mpg))

问题

建议的解决方案效率低下,因为总是评估 ifelse 调用;一种更理想的方法只会评估 applyFilter <- 1.

filter 步骤

尝试

低效 工作解决方案如下所示:

mtcars %>% 
    tibble::rownames_to_column(var = "model") %>% 
    # Only apply filter step if condition is met
    if (applyFilter) { 
        filter(grepl(x = model, pattern = "Merc"))
        }
    %>% 
    # Continue 
    group_by(am) %>% 
    summarise(meanMPG = mean(mpg))

当然,上面的语法是不正确的。这只是理想工作流程的示例。


想要的答案

这种方法怎么样:

mtcars %>% 
    tibble::rownames_to_column(var = "model") %>% 
    filter(if(applyfilter== 1) grepl(x = model, pattern = "Merc") else TRUE) %>% 
    group_by(am) %>% 
    summarise(meanMPG = mean(mpg))

这意味着 grepl 仅在 applyfilter 为 1 时才被评估,否则 filter 只是回收 TRUE.


或者另一种选择是使用 {}:

mtcars %>% 
  tibble::rownames_to_column(var = "model") %>% 
  {if(applyfilter == 1) filter(., grepl(x = model, pattern = "Merc")) else .} %>% 
  group_by(am) %>% 
  summarise(meanMPG = mean(mpg))

显然还有另一种可能的方法,您可以简单地打破管道,有条件地进行过滤器然后继续管道(我知道 OP 没有要求这样做,只是想为其他读者举另一个例子)

mtcars %<>% 
  tibble::rownames_to_column(var = "model")

if(applyfilter == 1) mtcars %<>% filter(grepl(x = model, pattern = "Merc"))

mtcars %>% 
  group_by(am) %>% 
  summarise(meanMPG = mean(mpg))