如何将匿名函数传递给 dplyr 总结

Question

我有一个包含 3 列的简单数据框：name、goal 和 actual. 因为这是对更大数据框的简化，所以我想使用 dplyr 来计算每个人达到目标的次数。

df <- data.frame(name = c(rep('Fred', 3), rep('Sally', 4)),
                 goal = c(4,6,5,7,3,8,5), actual=c(4,5,5,3,3,6,4))

结果应如下所示：

我应该能够传递类似于下面所示的匿名函数，但语法不太正确：

library(dplyr)
g <- group_by(df, name)
summ <- summarise(g, met_goal = sum((function(x,y) {
                                       if(x>y){return(0)}
                                       else{return(1)}
                                     })(goal, actual)
                                    )
                  )

当我运行上面的代码时，我看到了其中的 3 个错误：

Warning messages: 1: In if (x == y) { : the condition has length > 1 and only the first element will be used

Answer 1

使用data.table的解决方案：

您要求 dplyr 解决方案，但由于实际数据要大得多，您可以使用 data.table。 foo 是您要应用的函数。

foo <- function(x, y) {
    res <- 0
    if (x <= y) {
        res <- 1
    }
    return(res)
}

library(data.table)
setDT(df)
setkey(df, name)[, foo(goal, actual), .(name, 1:nrow(df))][, sum(V1), name]

如果你更喜欢管道，那么你可以使用这个：

library(magrittr)
setDT(df) %>%
    setkey(name) %>%
    .[, foo(goal, actual), .(name, 1:nrow(.))] %>%
    .[, .(met_goal = sum(V1)), name]

    name met_goal
1:  Fred        2
2: Sally        1

Answer 2

我们在goal和actual中有等长向量，所以关系运算符适合在这里使用。但是，当我们在简单的 if() 语句中使用它们时，我们可能会得到意想不到的结果，因为 if() 需要长度为 1 的向量。由于我们有等长向量并且我们需要一个二进制结果，所以对逻辑向量求和是最好的方法，如下所示。

group_by(df, name) %>%
    summarise(met_goal = sum(goal <= actual))
# A tibble: 2 x 2
    name met_goal
  <fctr>    <int>
1   Fred        2
2  Sally        1

运算符切换为 <=，因为您希望 goal > actual 为 0，否则为 1。

请注意，您可以使用匿名函数。是 if() 声明让你失望了。例如，使用

sum((function(x, y) x <= y)(goal, actual))

会按照您所询问的方式工作。

Answer 3

发现自己需要再次做类似的事情（一年后），但功能比原始问题中提供的简单功能更复杂。最初接受的答案利用了问题的特定特征，但在上触及了更一般的方法。使用这种方法，我最终得到的答案是这样的：

library(dplyr)

df <- data.frame(name = c(rep('Fred', 3), rep('Sally', 4)),
                 goal = c(4,6,5,7,3,8,5), actual=c(4,5,5,3,3,6,4))

my_func = function(act, goa) {
  if(act < goa) {
    return(0)
  } else {
    return(1)
  }
}

g <- group_by(df, name)
summ = df %>% group_by(name) %>%
  summarise(met_goal = sum(mapply(my_func, .data$actual, .data$goal)))

> summ
# A tibble: 2 x 2
  name  met_goal
  <fct>    <dbl>
1 Fred         2
2 Sally        1

原题是指使用匿名函数。本着这种精神，最后一部分看起来像这样：

g <- group_by(df, name)
summ = df %>% group_by(name) %>%
  summarise(met_goal = sum(mapply(function(act, go) {
                                    if(act < go) {
                                      return(0)
                                    } else {
                                      return(1)
                                    }
                                  }, .data$actual, .data$goal)))

如何将匿名函数传递给 dplyr 总结

How to pass an anonymous function to dplyr summarise

r

anonymous-function

dplyr