一次检查多个数据框列(灵活方式)
Checking multiple data frame columns at once (flexible manner)
寻找更好的方法:如何使 R 按元素检查多列的灵活子集的值(假设此处为 Var2
和 Var3
)并写入结果检查新的逻辑列?
这里有比按行 apply()
更短、更优雅的方法吗?
df <- read.csv(
text = '"Var1","Var2","Var3"
"","",""
"","","a"
"","a",""
"a","a","a"
"a","","a"
"","a",""
"","",""
"","","a"
"","a",""
"","","a"'
)
criticalColumns <- c("Var2", "Var3")
df$criticalColumnsAreEmpty <-
apply(df[, criticalColumns], 1, function(curRow) {
return(all(curRow == ""))
})
我也可以明确地做到这一点,但这并不灵活:
df$criticalColumnsAreEmpty <- df$Var2 == "" & df$Var3 == ""
期望的输出:
Var1 Var2 Var3 criticalColumnsAreEmpty
TRUE
a FALSE
a FALSE
a a a FALSE
a a FALSE
a FALSE
TRUE
a FALSE
a FALSE
a FALSE
我们可以在逻辑矩阵
上使用rowSums
df$criticalColumnsAreEmpty <- !rowSums(df[criticalColumns]!="")
df$criticalColumnsAreEmpty
#[1] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
或另一种选择(对于大数据集,以避免因内存原因转换为矩阵)在列上循环,检查元素是否为空白并使用 Reduce
和 &
Reduce(`&`, lapply(df[criticalColumns], function(x) !nzchar(as.character(x))))
寻找更好的方法:如何使 R 按元素检查多列的灵活子集的值(假设此处为 Var2
和 Var3
)并写入结果检查新的逻辑列?
这里有比按行 apply()
更短、更优雅的方法吗?
df <- read.csv(
text = '"Var1","Var2","Var3"
"","",""
"","","a"
"","a",""
"a","a","a"
"a","","a"
"","a",""
"","",""
"","","a"
"","a",""
"","","a"'
)
criticalColumns <- c("Var2", "Var3")
df$criticalColumnsAreEmpty <-
apply(df[, criticalColumns], 1, function(curRow) {
return(all(curRow == ""))
})
我也可以明确地做到这一点,但这并不灵活:
df$criticalColumnsAreEmpty <- df$Var2 == "" & df$Var3 == ""
期望的输出:
Var1 Var2 Var3 criticalColumnsAreEmpty
TRUE
a FALSE
a FALSE
a a a FALSE
a a FALSE
a FALSE
TRUE
a FALSE
a FALSE
a FALSE
我们可以在逻辑矩阵
上使用rowSums
df$criticalColumnsAreEmpty <- !rowSums(df[criticalColumns]!="")
df$criticalColumnsAreEmpty
#[1] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
或另一种选择(对于大数据集,以避免因内存原因转换为矩阵)在列上循环,检查元素是否为空白并使用 Reduce
和 &
Reduce(`&`, lapply(df[criticalColumns], function(x) !nzchar(as.character(x))))