使用 dplyr 时,是否有替代 plyr 的 "revalue" 功能?

Is there an alternative to "revalue" function from plyr when using dplyr?

我是 revalue 函数的粉丝,plyr 用于替换字符串。简单易记。

但是,我已将新代码迁移到 dplyr,它似乎没有 revalue 函数。在 dplyr 中,对于以前用 revalue 完成的事情,公认的成语是什么?

我们可以用 base R

中的 chartr 来做到这一点
chartr("ac", "AC", x)

数据

x <- c("a", "b", "c")

dplyr 版本 dplyr_0.5.0 开始有一个 recode 函数可用,它看起来非常类似于 [=29 的 revalue =]plyr.

根据 recode 文档 Examples 部分构建的示例:

set.seed(16)
x = sample(c("a", "b", "c"), 10, replace = TRUE)
x
 [1] "a" "b" "a" "b" "b" "a" "c" "c" "c" "a"

recode(x, a = "Apple", b = "Bear", c = "Car")

   [1] "Car"   "Apple" "Bear"  "Apple" "Car"   "Apple" "Apple" "Car"   "Car"   "Apple"

如果您只定义了一些要重新编码的值,默认情况下,其余值将填充 NA

recode(x, a = "Apple", c = "Car")
 [1] "Car"   "Apple" NA      "Apple" "Car"   "Apple" "Apple" "Car"   "Car"   "Apple"

可以使用 .default 参数更改此行为。

recode(x, a = "Apple", c = "Car", .default = x)
 [1] "Car"   "Apple" "b"     "Apple" "Car"   "Apple" "Apple" "Car"   "Car"   "Apple"

如果您想用其他内容替换缺失值,还有一个 .missing 参数。

我觉得方便的一个替代方法是 data.tables 的 mapvalues 函数 例如

df[, variable := mapvalues(variable, old = old_names_string_vector, new = new_names_string_vector)]

我想评论@aosmith 的答案,但缺乏声誉。似乎现在 dplyrrecode 功能的默认设置是不影响未指定的级别。

x = sample(c("a", "b", "c"), 10, replace = TRUE)
x
[1] "c" "c" "b" "b" "a" "b" "c" "c" "c" "b"

recode(x , a = "apple", b = "banana" )

[1] "c"      "c"      "banana" "banana" "apple"  "banana" "c"      "c"      "c"      "banana"

要将所有未指定级别更改为 NA,应包括参数 .default = NA_character_

recode(x, a = "apple", b = "banana", .default = NA_character_)

[1] "apple"  "banana" "apple"  "banana" "banana" "apple"  NA       NA       NA       "apple" 

R 基础溶液

为此,您可以使用 base 中的 ifelse()。函数参数是 ifelse(test, yes, no)。举个例子:

(x <- sample(c("a", "b", "c"), 5, replace = TRUE))
[1] "c" "a" "b" "a" "a"

ifelse(x == "a", "Apple", x)
[1] "c"     "Apple" "b"     "Apple" "Apple"

如果您想重新编码多个值,您可以像这样以嵌套方式使用该函数:

ifelse(x == "a", "Apple", ifelse(x == "b", "Banana", x))
[1] "c"      "Apple"  "Banana" "Apple"  "Apple"

自带函数

具有许多必须重新编码的值会使 ifelse() 的编码变得混乱。因此,我这里有一个自己的函数:

my_revalue <- function(x, ...){
  reval <- list(...)

  from <- names(reval)
  to <- unlist(reval)

  out <- eval(parse(text= paste0("{", paste0(paste0("x[x ==", "'", from,"'", "]", "<-", "'", to, "'"), collapse= ";"), ";x", "}")))

  return(out)
}

现在我们可以非常快速地更改多个值:

my_revalue(vec= x, "a" = "Apple", "b" = "Banana", "c" = "Cranberry")
[1] "Cranberry" "Apple"     "Banana"      "Apple"     "Apple"