循环遍历 R 中的聚合数据

Looping through aggregated data in R

我正在尝试估算数据框特定列中的缺失值。

我的意图是用其他列的组来代替它。

我使用 aggregate:

保存了聚合结果
# Replace LotFrontage missing values by Neighborhood mean
lot_frontage_by_neighborhood = aggregate(LotFrontage ~ Neighborhood, combined, mean)

现在我想实现这样的东西:

for key, group in lot_frontage_by_neighborhood:
    idx = (combined["Neighborhood"] == key) & (combined["LotFrontage"].isnull())
    combined[idx, "LotFrontage"] = group.median() 

这当然是python代码。

不确定如何在 R 中实现这一点,有人可以帮忙吗?

例如:

Neighborhood  LotFrontage
     A            20
     A            30
     B            20
     B            50
     A           <NA>

NA 记录应替换为 25(邻域 A 中所有记录的平均 LotFrontage)

谢谢

这是您正在寻找的想法吗?您可能需要 which() 函数来确定哪些行具有 NA 值。

set.seed(1)
Neighborhood = sample(letters[1:4], 10, TRUE)
LotFrontage = rnorm(10,0,1)
LotFrontage[sample(10, 2)] = NA

# This data frame has 2 columns. LotFrontage column has 10 missing values.
df = data.frame(Neighborhood = Neighborhood, LotFrontage = LotFrontage)

# Sets the missing values in the Neighborhood column to the mean of the LotFrontage values from the rows with that Neighborhood
x<-df[which(is.na(df$LotFrontage)),]$Neighborhood
f<-function(x) mean(df[(df$Neighborhood==x),]$LotFrontage, na.rm =TRUE)
df[which(is.na(df$LotFrontage)),]$LotFrontage <- lapply(x,f)