删除具有每个水平少于 5 个观察值的因子的列
Remove columns with factors that has less than 5 observations per level
我有一个由 100 多列组成的数据集,所有列都是 factor 类型。例如:
animal fruit vehicle color
cat orange car blue
dog apple bus green
dog apple car green
dog orange bus green
在我的数据集中,我需要删除所有具有每个级别少于 5 个观察值的因子的列。在此示例中,如果我想删除每个级别的观察量小于或等于 1
的所有列,如 blue
或 cat
,算法将删除列 animal
和 color
。最优雅的方法是什么?
我们可以使用 Filter
和 table
Filter(function(x) !any(table(x) < 2), df1)
# fruit vehicle
#1 orange car
#2 apple bus
#3 apple car
#4 orange bus
数据
df1 <- structure(list(animal = structure(c(1L, 2L, 2L, 2L), .Label = c("cat",
"dog"), class = "factor"), fruit = structure(c(2L, 1L, 1L, 2L
), .Label = c("apple", "orange"), class = "factor"), vehicle = structure(c(2L,
1L, 2L, 1L), .Label = c("bus", "car"), class = "factor"), color = structure(c(1L,
2L, 2L, 2L), .Label = c("blue", "green"), class = "factor")),
row.names = c(NA,
-4L), class = "data.frame")
我们可以使用 select_if
从 dplyr
library(dplyr)
df1 %>% select_if(~all(table(.) > 1))
# fruit vehicle
#1 orange car
#2 apple bus
#3 apple car
#4 orange bus
我有一个由 100 多列组成的数据集,所有列都是 factor 类型。例如:
animal fruit vehicle color
cat orange car blue
dog apple bus green
dog apple car green
dog orange bus green
在我的数据集中,我需要删除所有具有每个级别少于 5 个观察值的因子的列。在此示例中,如果我想删除每个级别的观察量小于或等于 1
的所有列,如 blue
或 cat
,算法将删除列 animal
和 color
。最优雅的方法是什么?
我们可以使用 Filter
和 table
Filter(function(x) !any(table(x) < 2), df1)
# fruit vehicle
#1 orange car
#2 apple bus
#3 apple car
#4 orange bus
数据
df1 <- structure(list(animal = structure(c(1L, 2L, 2L, 2L), .Label = c("cat",
"dog"), class = "factor"), fruit = structure(c(2L, 1L, 1L, 2L
), .Label = c("apple", "orange"), class = "factor"), vehicle = structure(c(2L,
1L, 2L, 1L), .Label = c("bus", "car"), class = "factor"), color = structure(c(1L,
2L, 2L, 2L), .Label = c("blue", "green"), class = "factor")),
row.names = c(NA,
-4L), class = "data.frame")
我们可以使用 select_if
从 dplyr
library(dplyr)
df1 %>% select_if(~all(table(.) > 1))
# fruit vehicle
#1 orange car
#2 apple bus
#3 apple car
#4 orange bus