如何在预测后替换列表中的项目?
How do I replace items in a list after a prediction?
我正在尝试使用列表来构建缺失值的预测,然后将这些缺失值写回到列表中。我对预测很满意,但在那之后卡住了 - 我如何将新发现的值写回 my_list?
#my_list is a list with cars, some are missing MPG
# These cars have no MPG
empty_rows <- subset(my_list, cartable.mpg=='0')
#These have an MPG, we'll use them to build our model
usable_rows <- subset(my_list, cartable.mpg !='0')
#Do a regression based on mpg,cylinders and weight
fitted_lm = lm(as.numeric(cartable.mpg) ~ as.numeric(cartable.cyl)+as.numeric(cartable.wt), usable_rows)
#Predict the missing rows
filled_rows <- predict(fitted_lm, empty_rows)
由于您没有提供任何可重现的最小数据集,这里是一个使用 mtcars
.
的示例
简而言之,我将 mtcars
分成一个训练数据集(用于模型构建)和一个已删除响应变量的测试数据集(在本例中为 mpg
)。然后我构建了一个线性模型 lm(mpg ~ wt)
并使用该模型为测试数据集预测 mpg
。
# Training sample is half the full sample
# Set fixed RNG seed for reproducibility
set.seed(2017);
idx <- sample(nrow(mtcars) / 2);
# Training sample to build the model
df.train <- mtcars[idx, ];
# Test sample without response variable in column 1
df.test <- mtcars[-idx, -1];
# Linear model
fit <- lm(mpg ~ wt, data = df.train);
# Prediction for test sample
pred <- predict(fit, df.test);
df.test <- cbind.data.frame(
mpg = pred,
df.test);
# Bind data for training and test sample and flag which one is which
df <- rbind.data.frame(
cbind.data.frame(df.train, train = TRUE),
cbind.data.frame(df.test, train = FALSE));
df[, c("mpg", "wt", "train")];
# mpg wt train
#Cadillac Fleetwood 10.40000 5.250 TRUE
#Merc 230 22.80000 3.150 TRUE
#Duster 360 14.30000 3.570 TRUE
#Hornet 4 Drive 21.40000 3.215 TRUE
#Merc 280 19.20000 3.440 TRUE
#Lincoln Continental 10.40000 5.424 TRUE
#Mazda RX4 21.00000 2.620 TRUE
#Merc 450SL 17.30000 3.730 TRUE
#Merc 280C 17.80000 3.440 TRUE
#Mazda RX4 Wag 21.00000 2.875 TRUE
#Hornet Sportabout 18.70000 3.440 TRUE
#Merc 450SE 16.40000 4.070 TRUE
#Valiant 18.10000 3.460 TRUE
#Merc 450SLC 15.20000 3.780 TRUE
#Merc 240D 24.40000 3.190 TRUE
#Datsun 710 22.80000 2.320 TRUE
#Chrysler Imperial 10.17314 5.345 FALSE
#Fiat 128 24.32264 2.200 FALSE
#Honda Civic 26.95458 1.615 FALSE
#Toyota Corolla 25.96479 1.835 FALSE
#Toyota Corona 23.13039 2.465 FALSE
#Dodge Challenger 18.38390 3.520 FALSE
#AMC Javelin 18.76632 3.435 FALSE
#Camaro Z28 16.94420 3.840 FALSE
#Pontiac Firebird 16.92171 3.845 FALSE
#Fiat X1-9 25.51488 1.935 FALSE
#Porsche 914-2 24.59258 2.140 FALSE
#Lotus Europa 27.41348 1.513 FALSE
#Ford Pantera L 19.95856 3.170 FALSE
#Ferrari Dino 21.75818 2.770 FALSE
#Maserati Bora 18.15895 3.570 FALSE
#Volvo 142E 21.71319 2.780 FALSE
我正在尝试使用列表来构建缺失值的预测,然后将这些缺失值写回到列表中。我对预测很满意,但在那之后卡住了 - 我如何将新发现的值写回 my_list?
#my_list is a list with cars, some are missing MPG
# These cars have no MPG
empty_rows <- subset(my_list, cartable.mpg=='0')
#These have an MPG, we'll use them to build our model
usable_rows <- subset(my_list, cartable.mpg !='0')
#Do a regression based on mpg,cylinders and weight
fitted_lm = lm(as.numeric(cartable.mpg) ~ as.numeric(cartable.cyl)+as.numeric(cartable.wt), usable_rows)
#Predict the missing rows
filled_rows <- predict(fitted_lm, empty_rows)
由于您没有提供任何可重现的最小数据集,这里是一个使用 mtcars
.
简而言之,我将 mtcars
分成一个训练数据集(用于模型构建)和一个已删除响应变量的测试数据集(在本例中为 mpg
)。然后我构建了一个线性模型 lm(mpg ~ wt)
并使用该模型为测试数据集预测 mpg
。
# Training sample is half the full sample
# Set fixed RNG seed for reproducibility
set.seed(2017);
idx <- sample(nrow(mtcars) / 2);
# Training sample to build the model
df.train <- mtcars[idx, ];
# Test sample without response variable in column 1
df.test <- mtcars[-idx, -1];
# Linear model
fit <- lm(mpg ~ wt, data = df.train);
# Prediction for test sample
pred <- predict(fit, df.test);
df.test <- cbind.data.frame(
mpg = pred,
df.test);
# Bind data for training and test sample and flag which one is which
df <- rbind.data.frame(
cbind.data.frame(df.train, train = TRUE),
cbind.data.frame(df.test, train = FALSE));
df[, c("mpg", "wt", "train")];
# mpg wt train
#Cadillac Fleetwood 10.40000 5.250 TRUE
#Merc 230 22.80000 3.150 TRUE
#Duster 360 14.30000 3.570 TRUE
#Hornet 4 Drive 21.40000 3.215 TRUE
#Merc 280 19.20000 3.440 TRUE
#Lincoln Continental 10.40000 5.424 TRUE
#Mazda RX4 21.00000 2.620 TRUE
#Merc 450SL 17.30000 3.730 TRUE
#Merc 280C 17.80000 3.440 TRUE
#Mazda RX4 Wag 21.00000 2.875 TRUE
#Hornet Sportabout 18.70000 3.440 TRUE
#Merc 450SE 16.40000 4.070 TRUE
#Valiant 18.10000 3.460 TRUE
#Merc 450SLC 15.20000 3.780 TRUE
#Merc 240D 24.40000 3.190 TRUE
#Datsun 710 22.80000 2.320 TRUE
#Chrysler Imperial 10.17314 5.345 FALSE
#Fiat 128 24.32264 2.200 FALSE
#Honda Civic 26.95458 1.615 FALSE
#Toyota Corolla 25.96479 1.835 FALSE
#Toyota Corona 23.13039 2.465 FALSE
#Dodge Challenger 18.38390 3.520 FALSE
#AMC Javelin 18.76632 3.435 FALSE
#Camaro Z28 16.94420 3.840 FALSE
#Pontiac Firebird 16.92171 3.845 FALSE
#Fiat X1-9 25.51488 1.935 FALSE
#Porsche 914-2 24.59258 2.140 FALSE
#Lotus Europa 27.41348 1.513 FALSE
#Ford Pantera L 19.95856 3.170 FALSE
#Ferrari Dino 21.75818 2.770 FALSE
#Maserati Bora 18.15895 3.570 FALSE
#Volvo 142E 21.71319 2.780 FALSE