使用 dplyr 时,是否有替代 plyr 的 "revalue" 功能?
Is there an alternative to "revalue" function from plyr when using dplyr?
我是 revalue
函数的粉丝,plyr
用于替换字符串。简单易记。
但是,我已将新代码迁移到 dplyr
,它似乎没有 revalue
函数。在 dplyr
中,对于以前用 revalue
完成的事情,公认的成语是什么?
我们可以用 base R
中的 chartr
来做到这一点
chartr("ac", "AC", x)
数据
x <- c("a", "b", "c")
从 dplyr 版本 dplyr_0.5.0 开始有一个 recode
函数可用,它看起来非常类似于 [=29 的 revalue
=]plyr.
根据 recode
文档 Examples 部分构建的示例:
set.seed(16)
x = sample(c("a", "b", "c"), 10, replace = TRUE)
x
[1] "a" "b" "a" "b" "b" "a" "c" "c" "c" "a"
recode(x, a = "Apple", b = "Bear", c = "Car")
[1] "Car" "Apple" "Bear" "Apple" "Car" "Apple" "Apple" "Car" "Car" "Apple"
如果您只定义了一些要重新编码的值,默认情况下,其余值将填充 NA
。
recode(x, a = "Apple", c = "Car")
[1] "Car" "Apple" NA "Apple" "Car" "Apple" "Apple" "Car" "Car" "Apple"
可以使用 .default
参数更改此行为。
recode(x, a = "Apple", c = "Car", .default = x)
[1] "Car" "Apple" "b" "Apple" "Car" "Apple" "Apple" "Car" "Car" "Apple"
如果您想用其他内容替换缺失值,还有一个 .missing
参数。
我觉得方便的一个替代方法是 data.tables 的 mapvalues 函数
例如
df[, variable := mapvalues(variable, old = old_names_string_vector, new = new_names_string_vector)]
我想评论@aosmith 的答案,但缺乏声誉。似乎现在 dplyr
的 recode
功能的默认设置是不影响未指定的级别。
x = sample(c("a", "b", "c"), 10, replace = TRUE)
x
[1] "c" "c" "b" "b" "a" "b" "c" "c" "c" "b"
recode(x , a = "apple", b = "banana" )
[1] "c" "c" "banana" "banana" "apple" "banana" "c" "c" "c" "banana"
要将所有未指定级别更改为 NA
,应包括参数 .default = NA_character_
。
recode(x, a = "apple", b = "banana", .default = NA_character_)
[1] "apple" "banana" "apple" "banana" "banana" "apple" NA NA NA "apple"
R 基础溶液
为此,您可以使用 base
中的 ifelse()
。函数参数是 ifelse(test, yes, no)
。举个例子:
(x <- sample(c("a", "b", "c"), 5, replace = TRUE))
[1] "c" "a" "b" "a" "a"
ifelse(x == "a", "Apple", x)
[1] "c" "Apple" "b" "Apple" "Apple"
如果您想重新编码多个值,您可以像这样以嵌套方式使用该函数:
ifelse(x == "a", "Apple", ifelse(x == "b", "Banana", x))
[1] "c" "Apple" "Banana" "Apple" "Apple"
自带函数
具有许多必须重新编码的值会使 ifelse()
的编码变得混乱。因此,我这里有一个自己的函数:
my_revalue <- function(x, ...){
reval <- list(...)
from <- names(reval)
to <- unlist(reval)
out <- eval(parse(text= paste0("{", paste0(paste0("x[x ==", "'", from,"'", "]", "<-", "'", to, "'"), collapse= ";"), ";x", "}")))
return(out)
}
现在我们可以非常快速地更改多个值:
my_revalue(vec= x, "a" = "Apple", "b" = "Banana", "c" = "Cranberry")
[1] "Cranberry" "Apple" "Banana" "Apple" "Apple"
我是 revalue
函数的粉丝,plyr
用于替换字符串。简单易记。
但是,我已将新代码迁移到 dplyr
,它似乎没有 revalue
函数。在 dplyr
中,对于以前用 revalue
完成的事情,公认的成语是什么?
我们可以用 base R
chartr
来做到这一点
chartr("ac", "AC", x)
数据
x <- c("a", "b", "c")
从 dplyr 版本 dplyr_0.5.0 开始有一个 recode
函数可用,它看起来非常类似于 [=29 的 revalue
=]plyr.
根据 recode
文档 Examples 部分构建的示例:
set.seed(16)
x = sample(c("a", "b", "c"), 10, replace = TRUE)
x
[1] "a" "b" "a" "b" "b" "a" "c" "c" "c" "a"
recode(x, a = "Apple", b = "Bear", c = "Car")
[1] "Car" "Apple" "Bear" "Apple" "Car" "Apple" "Apple" "Car" "Car" "Apple"
如果您只定义了一些要重新编码的值,默认情况下,其余值将填充 NA
。
recode(x, a = "Apple", c = "Car")
[1] "Car" "Apple" NA "Apple" "Car" "Apple" "Apple" "Car" "Car" "Apple"
可以使用 .default
参数更改此行为。
recode(x, a = "Apple", c = "Car", .default = x)
[1] "Car" "Apple" "b" "Apple" "Car" "Apple" "Apple" "Car" "Car" "Apple"
如果您想用其他内容替换缺失值,还有一个 .missing
参数。
我觉得方便的一个替代方法是 data.tables 的 mapvalues 函数 例如
df[, variable := mapvalues(variable, old = old_names_string_vector, new = new_names_string_vector)]
我想评论@aosmith 的答案,但缺乏声誉。似乎现在 dplyr
的 recode
功能的默认设置是不影响未指定的级别。
x = sample(c("a", "b", "c"), 10, replace = TRUE)
x
[1] "c" "c" "b" "b" "a" "b" "c" "c" "c" "b"
recode(x , a = "apple", b = "banana" )
[1] "c" "c" "banana" "banana" "apple" "banana" "c" "c" "c" "banana"
要将所有未指定级别更改为 NA
,应包括参数 .default = NA_character_
。
recode(x, a = "apple", b = "banana", .default = NA_character_)
[1] "apple" "banana" "apple" "banana" "banana" "apple" NA NA NA "apple"
R 基础溶液
为此,您可以使用 base
中的 ifelse()
。函数参数是 ifelse(test, yes, no)
。举个例子:
(x <- sample(c("a", "b", "c"), 5, replace = TRUE))
[1] "c" "a" "b" "a" "a"
ifelse(x == "a", "Apple", x)
[1] "c" "Apple" "b" "Apple" "Apple"
如果您想重新编码多个值,您可以像这样以嵌套方式使用该函数:
ifelse(x == "a", "Apple", ifelse(x == "b", "Banana", x))
[1] "c" "Apple" "Banana" "Apple" "Apple"
自带函数
具有许多必须重新编码的值会使 ifelse()
的编码变得混乱。因此,我这里有一个自己的函数:
my_revalue <- function(x, ...){
reval <- list(...)
from <- names(reval)
to <- unlist(reval)
out <- eval(parse(text= paste0("{", paste0(paste0("x[x ==", "'", from,"'", "]", "<-", "'", to, "'"), collapse= ";"), ";x", "}")))
return(out)
}
现在我们可以非常快速地更改多个值:
my_revalue(vec= x, "a" = "Apple", "b" = "Banana", "c" = "Cranberry")
[1] "Cranberry" "Apple" "Banana" "Apple" "Apple"