当数据框列不包含该级别时如何强制执行特定级别? (使用 R)

How to force specific levels when dataframe column does not contain that level? (Using R)

我在数据集中有可能包含 0 或 1 的列,但有些列只包含 0。

我想使用这些数字作为因数,但我仍然希望每一列都具有级别 0 和 1。我尝试了下面的代码,但我一直收到错误,但我不明白为什么...

#dataframe df has 100 rows

column_list = c("col1", "col2",  "col3")  

for (col in column_list) {
      #convert number 0 and number 1 to factors
      # (but sometimes the data only has zeros)
      df[,col] <- as.factor(df[,col])

      # I want to force levels to be 0 and 1
      # this is for when the data is completely missing number 1

      levels(df[, col] <- c(0,1))          #give error

      # Error in `[<-.data.frame`(`*tmp*`, , col, value = c(0, 1)) : 
      # replacement has 2 rows, data has 100


      print(levels(df[, col]))
      #this produces "0" "1" or just "0" depending on the column

}

你指出你的错误在哪里,那行写错了。应该是:

df[, col] <- factor(df[, col], levels = c(0,1)

您甚至不需要上一行。 您甚至可以避免 for 循环并使用 apply:

df <- apply(df, 2, levels, c(0,1))

我认为你刚刚把 ) 放错了地方

这个有效:

column_list = c("col1", "col2",  "col3")  
df <- data.frame(matrix(0, nrow = 100, ncol = 3))
names(df) <- column_list

for (col in column_list) {
  #convert number 0 and number 1 to factors
  # (but sometimes the data only has zeros)
  df[,col] <- as.factor(df[,col])

  # I want to force levels to be 0 and 1
  # this is for when the data is completely missing number 1

  levels(df[, col]) <- c(0,1)          #no error anymore

  # Error in `[<-.data.frame`(`*tmp*`, , col, value = c(0, 1)) : 
  # replacement has 2 rows, data has 100


  print(levels(df[, col]))
  #this produces "0" "1" or just "0" depending on the column

}