基于线性模型预测一些缺失值
predict some missing values based on linear model
datax <- matrix(1:32, nrow = 8)
datax[2:5,1] <- NA
m <- data.frame(datax)
names(m)[c(1:4)] <- c("Length", "Width", "sex", "height")
model <- glm(Length ~ Width + sex + height, data = m)
你如何根据模型预测 NA 值(代码只是作为示例)
我有一个包含 15 个变量的数据集,响应变量有一些缺失值。我如何根据从该数据集构建的线性模型预测响应变量的缺失值?
我想你可以尝试像下面这样预测
options(warn=-1) # shut down warnings
m[is.na(m[1]),1] <- predict(model,newdata = subset(m[-1],is.na(m[1])))
options(warn=1) # turn on warnings
这样
> m
Length Width sex height
1 1 9 17 25
2 2 10 18 26
3 3 11 19 27
4 4 12 20 28
5 5 13 21 29
6 6 14 22 30
7 7 15 23 31
8 8 16 24 32
如何将您的数据子集化为有和没有缺失值的部分,基于后者创建线性模型并通过 predict()
将缺失值归因于前者?
library(tidyverse)
datax <- matrix(1:32, nrow = 8)
datax[2:5,1] <- NA
m <- data.frame(datax)
names(m)[c(1:4)] <- c("Length", "Width", "sex", "height")
# Creating an index of rows with missing values in "Length"
missing_index <- which(is.na(m$Length))
# Subsetting rows with missing values
m_missing <- m[missing_index,]
# Subsetting the rest
m_rest <- m[-missing_index,]
# Creating a linear model on m_rest and making predictions on m_missing
model <- lm(Length ~ ., data = m_rest)
predictions <- predict(model, newdata = m_missing %>% select(-Length))
# Insert missing values into the original dataframe
m[missing_index, "Length"] <- predictions
导致:
> print(m)
Length Width sex height
1 1 9 17 25
2 2 10 18 26
3 3 11 19 27
4 4 12 20 28
5 5 13 21 29
6 6 14 22 30
7 7 15 23 31
8 8 16 24 32
datax <- matrix(1:32, nrow = 8)
datax[2:5,1] <- NA
m <- data.frame(datax)
names(m)[c(1:4)] <- c("Length", "Width", "sex", "height")
model <- glm(Length ~ Width + sex + height, data = m)
你如何根据模型预测 NA 值(代码只是作为示例)
我有一个包含 15 个变量的数据集,响应变量有一些缺失值。我如何根据从该数据集构建的线性模型预测响应变量的缺失值?
我想你可以尝试像下面这样预测
options(warn=-1) # shut down warnings
m[is.na(m[1]),1] <- predict(model,newdata = subset(m[-1],is.na(m[1])))
options(warn=1) # turn on warnings
这样
> m
Length Width sex height
1 1 9 17 25
2 2 10 18 26
3 3 11 19 27
4 4 12 20 28
5 5 13 21 29
6 6 14 22 30
7 7 15 23 31
8 8 16 24 32
如何将您的数据子集化为有和没有缺失值的部分,基于后者创建线性模型并通过 predict()
将缺失值归因于前者?
library(tidyverse)
datax <- matrix(1:32, nrow = 8)
datax[2:5,1] <- NA
m <- data.frame(datax)
names(m)[c(1:4)] <- c("Length", "Width", "sex", "height")
# Creating an index of rows with missing values in "Length"
missing_index <- which(is.na(m$Length))
# Subsetting rows with missing values
m_missing <- m[missing_index,]
# Subsetting the rest
m_rest <- m[-missing_index,]
# Creating a linear model on m_rest and making predictions on m_missing
model <- lm(Length ~ ., data = m_rest)
predictions <- predict(model, newdata = m_missing %>% select(-Length))
# Insert missing values into the original dataframe
m[missing_index, "Length"] <- predictions
导致:
> print(m)
Length Width sex height
1 1 9 17 25
2 2 10 18 26
3 3 11 19 27
4 4 12 20 28
5 5 13 21 29
6 6 14 22 30
7 7 15 23 31
8 8 16 24 32