基于缺失类别插入 NA 值的方法

Question

我有以下数据框

  Author   Score   Value
  A        High    10
  B        Low     20
  C        Medium  30
  A        Low     15
  B        Medium  22

我想重新排列数据框以显示作者在其中一种可能 'Score' 中没有得分。

我想为作者插入一个条目，这样 NA 就会出现。

  Author   Score   Value
  A        Low     15
  A        Medium  NA
  A        High    10
  B        Low     20
  B        Medium  22
  B        High    NA
  C        Low     NA
  C        Medium  30
  C        High    NA

有没有简单的方法，即在 R 中执行此操作的命令，还是我最好编写一个专用函数？

任何有关要查看哪个命令或提示的建议都将不胜感激。

Answer 1

您正在寻找 expand.grid 和 merge。这就是你要做的。

lvls <- expand.grid(lapply(df[, c('Author', 'Score')], levels))
merge(df, lvls, all=TRUE)

或者如果顺序很重要，你可以

lvls <- expand.grid(lapply(df[, c('Author', 'Score')], levels))
df.new <- merge(df, lvls, all=TRUE)
df.new[, 'Score'] <- factor(df.new[, 'Score'], levels=c('Low', 'Medium', 'High'))
df.new[order(df.new$Author, df.new$Score), ]

如果您的 data.frame 没有 factors，但有 characters，您还可以使用以下更通用的函数。之后您仍然需要重新订购。

expand.df <- function(data, factors) {
  lvls <- expand.grid(lapply(data[, factors], function(x) {
    if (is.factor(x)) return(levels(x))
    else return(unique(x))
  }))
  return(merge(df, lvls, all=TRUE))
}
expand.df(df, c('Author', 'Score'))

Answer 2

一个选项使用data.table

library(data.table)
df$Score <- factor(df$Score, levels=c('Low', 'Medium', 'High'))
setkey(setDT(df), Author, Score)[CJ(unique(Author), unique(Score))]
#   Author  Score Value
#1:      A    Low    15
#2:      A Medium    NA
#3:      A   High    10
#4:      B    Low    20
#5:      B Medium    22
#6:      B   High    NA
#7:      C    Low    NA
#8:      C Medium    30
#9:      C   High    NA

基于缺失类别插入 NA 值的方法

method to insert NA values based on a missing category

r