将现有数据帧变量转换为 Tidyverse 中的因子

Convert existing dataframe variable to factor in Tidyverse

我知道这个问题有很多版本,但我正在寻找一个具体的解决方案。当你在 dataframe 中有一个现有的字符变量时,是否有一种简单的方法可以使用 tidyverse 格式将该变量转换为因子?例如,下面的第二行代码不会对因子水平重新排序,但最后一行会。我如何使第二行工作?在某些情况下这会很有用——导入和修改现有数据集。非常感谢!

df <- data.frame(x = c(1,2), y = c('post','pre')) %>%
      as_factor(y, levels = c('pre','post'))

df$y <- factor(df$y, levels = c('pre', 'post'))

我们可以使用 fct_relevelforcats

library(dplyr)
library(forcats)
df1 <- data.frame(x = c(1,2), y = c('post','pre')) %>% 
       mutate(y = fct_relevel(y, 'pre', 'post')) 

-输出

> df1$y
[1] post pre 
Levels: pre post

关于 as_factor 的使用,根据文档

Compared to base R, when x is a character, this function creates levels in the order in which they appear, which will be the same on every platform.

post,然后是 pre

> as_factor(c('post','pre'))
[1] post pre 
Levels: post pre

而以下选项将不起作用,因为 as_factor

中没有名为 levels 的参数
> as_factor(c('post','pre'), "pre", "post")
Error: 2 components of `...` were not used.

We detected these problematic arguments:
* `..1`
* `..2`

Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.
> as_factor(c('post','pre'), levels = c("pre", "post"))
Error: 1 components of `...` were not used.

We detected these problematic arguments:
* `levels`

Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.

此外,在tidyverse中,我们需要提取带有pull.$的列,否则必须修改mutate中的列。

我们也可以使用 relevel:

df <- data.frame(x = c(1,2), y = c('post','pre')) 

library(dplyr)
df <- df %>% 
  mutate(y = relevel(as.factor(y), 'pre', 'post'))

df$y
levels(df$y)
  x    y
1 1 post
2 2  pre

> df$y
[1] post pre 
Levels: pre post
> levels(df$y)
[1] "pre"  "post"