如何删除 r 中所有因子变量中的一个特定因子水平?
How to remove one specific factor level in all factor variables in r?
例如,我有一个包含 6 列的数据框(都是因子)。
我想从出现该级别的所有因素中删除特定级别,例如"no"。
我想从我的因子变量中删除因子水平 "no",同时我想删除(设置为 NA)所有具有 "no" 值的答案。
我试过这个代码:
sapply(fact,function(x) levels(x)[levels(x) == "no"] <- NULL)
但是这段代码不起作用。
我该怎么做?
我认为这应该可以完成您想要做的事情。
dfNew <- data.frame(lapply(df, function(x) {is.na(x[x=="no"]) <- TRUE; droplevels(x)}))
数据
set.seed(1234)
df <- data.frame(q1=sample(c("yes", "no", "maybe"), 20, replace=TRUE),
q2=sample(c("yes", "no", "maybe"), 20, replace=TRUE),
q3=sample(c("yes", "no", "maybe"), 20, replace=TRUE))
这个怎么样:
> df
# c1 c2 c3
# 1 yes yes no
# 2 no ok yes
# 3 ok no ok
# 4 yes yes no
# 5 no ok yes
# 6 ok no ok
# 7 yes yes no
# 8 no ok yes
# 9 ok no ok
toRemove <- "no"
data.frame(lapply(df,
function(x) factor(as.character(x), levels=levels(x)[levels(x)!=toRemove])))
# c1 c2 c3
# 1 yes yes <NA>
# 2 <NA> ok yes
# 3 ok <NA> ok
# 4 yes yes <NA>
# 5 <NA> ok yes
# 6 ok <NA> ok
# 7 yes yes <NA>
# 8 <NA> ok yes
# 9 ok <NA> ok
玩具资料
df <- structure(list(c1 = structure(c(3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L), .Label = c("no", "ok", "yes"), class = "factor"), c2 = structure(c(3L,
2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L), .Label = c("no", "ok", "yes"
), class = "factor"), c3 = structure(c(1L, 3L, 2L, 1L, 3L, 2L,
1L, 3L, 2L), .Label = c("no", "ok", "yes"), class = "factor")), .Names = c("c1",
"c2", "c3"), row.names = c(NA, -9L), class = "data.frame")
上面的答案很好。我要补充一点,如果不是所有的列都是因素 and/or 你想保留所有因素水平(包括没有数据的因素水平)而不是明确删除的因素水平,你需要一种更通用的方法:
#Define a helper function
removeOneLevel <- function(v, badlevel){
v[v==badlevel] = NA
v2 = droplevels(v)
levels(v2) = levels(v)[levels(v) != badlevel]
return(v2)}
# Use dplyr to perform that function on all factor columns
library(dplyr)
dfNew = mutate_if(df, is.factor, removeOneLevel, badlevel = 'no')
例如,我有一个包含 6 列的数据框(都是因子)。
我想从出现该级别的所有因素中删除特定级别,例如"no"。
我想从我的因子变量中删除因子水平 "no",同时我想删除(设置为 NA)所有具有 "no" 值的答案。
我试过这个代码:
sapply(fact,function(x) levels(x)[levels(x) == "no"] <- NULL)
但是这段代码不起作用。
我该怎么做?
我认为这应该可以完成您想要做的事情。
dfNew <- data.frame(lapply(df, function(x) {is.na(x[x=="no"]) <- TRUE; droplevels(x)}))
数据
set.seed(1234)
df <- data.frame(q1=sample(c("yes", "no", "maybe"), 20, replace=TRUE),
q2=sample(c("yes", "no", "maybe"), 20, replace=TRUE),
q3=sample(c("yes", "no", "maybe"), 20, replace=TRUE))
这个怎么样:
> df
# c1 c2 c3
# 1 yes yes no
# 2 no ok yes
# 3 ok no ok
# 4 yes yes no
# 5 no ok yes
# 6 ok no ok
# 7 yes yes no
# 8 no ok yes
# 9 ok no ok
toRemove <- "no"
data.frame(lapply(df,
function(x) factor(as.character(x), levels=levels(x)[levels(x)!=toRemove])))
# c1 c2 c3
# 1 yes yes <NA>
# 2 <NA> ok yes
# 3 ok <NA> ok
# 4 yes yes <NA>
# 5 <NA> ok yes
# 6 ok <NA> ok
# 7 yes yes <NA>
# 8 <NA> ok yes
# 9 ok <NA> ok
玩具资料
df <- structure(list(c1 = structure(c(3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L), .Label = c("no", "ok", "yes"), class = "factor"), c2 = structure(c(3L,
2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L), .Label = c("no", "ok", "yes"
), class = "factor"), c3 = structure(c(1L, 3L, 2L, 1L, 3L, 2L,
1L, 3L, 2L), .Label = c("no", "ok", "yes"), class = "factor")), .Names = c("c1",
"c2", "c3"), row.names = c(NA, -9L), class = "data.frame")
上面的答案很好。我要补充一点,如果不是所有的列都是因素 and/or 你想保留所有因素水平(包括没有数据的因素水平)而不是明确删除的因素水平,你需要一种更通用的方法:
#Define a helper function
removeOneLevel <- function(v, badlevel){
v[v==badlevel] = NA
v2 = droplevels(v)
levels(v2) = levels(v)[levels(v) != badlevel]
return(v2)}
# Use dplyr to perform that function on all factor columns
library(dplyr)
dfNew = mutate_if(df, is.factor, removeOneLevel, badlevel = 'no')