一次检查多个数据框列(灵活方式)

Checking multiple data frame columns at once (flexible manner)

寻找更好的方法:如何使 R 按元素检查多列的灵活子集的值(假设此处为 Var2Var3)并写入结果检查新的逻辑列?

这里有比按行 apply() 更短、更优雅的方法吗?

df <- read.csv(
  text = '"Var1","Var2","Var3"
  "","",""
  "","","a"
  "","a",""
  "a","a","a"
  "a","","a"
  "","a",""
  "","",""
  "","","a"
  "","a",""
  "","","a"'
)

criticalColumns <- c("Var2", "Var3")

df$criticalColumnsAreEmpty <-
  apply(df[, criticalColumns], 1, function(curRow) {
    return(all(curRow == ""))
  })

我也可以明确地做到这一点,但这并不灵活:

df$criticalColumnsAreEmpty <- df$Var2 == "" & df$Var3 == ""

期望的输出:

 Var1 Var2 Var3 criticalColumnsAreEmpty
                                   TRUE
              a                   FALSE
         a                        FALSE
    a    a    a                   FALSE
    a         a                   FALSE
         a                        FALSE
                                   TRUE
              a                   FALSE
         a                        FALSE
              a                   FALSE

我们可以在逻辑矩阵

上使用rowSums
df$criticalColumnsAreEmpty <- !rowSums(df[criticalColumns]!="")
df$criticalColumnsAreEmpty
#[1]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

或另一种选择(对于大数据集,以避免因内存原因转换为矩阵)在列上循环,检查元素是否为空白并使用 Reduce&

Reduce(`&`, lapply(df[criticalColumns], function(x) !nzchar(as.character(x))))