在 R 中使用食谱后如何规范化(反向变换)变量?
How to unnormalize (backtransform) variable after using recipes in R?
我正在使用 train
函数训练 neuralnet
并使用 recipes
预处理数据。
是否有任何函数可以根据模型进行预测,然后在其原始范围内重新缩放它们,在我的例子中是 [1, 100]
?
library(caret)
library(recipes)
library(neuralnet)
# Create the dataset - times table
tt <- data.frame(multiplier = rep(1:10, times = 10), multiplicand = rep(1:10, each = 10))
tt <- cbind(tt, data.frame(product = tt$multiplier * tt$multiplicand))
# Splitting
indexes <- createDataPartition(tt$product,
times = 1,
p = 0.7,
list = FALSE)
tt.train <- tt[indexes,]
tt.test <- tt[-indexes,]
# Recipe to pre-process our data
rec_reg <- recipe(product ~ ., data = tt.train) %>%
step_center(all_predictors()) %>% step_scale(all_outcomes()) %>%
step_center(all_outcomes()) %>% step_scale(all_predictors())
# Train
train.control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 3,
savePredictions = TRUE)
tune.grid <- expand.grid(layer1 = 8,
layer2 = 0,
layer3 = 0)
# Setting seed for reproducibility
set.seed(12)
tt.cv <- train(rec_reg,
data = tt.train,
method = 'neuralnet',
tuneGrid = tune.grid,
trControl = train.control,
algorithm = 'backprop',
learningrate = 0.005,
lifesign = 'minimal')
如果你使用step_normalize
而不是step_scale
和step_center
,你可以在recipe
的基础上使用下面的函数来"un-normalize"。 (如果您更喜欢两步标准化,则需要调整 unnormalize
函数。)
该函数用于提取相关步骤。
#' Extract step item
#'
#' Returns extracted step item from prepped recipe.
#'
#' @param recipe Prepped recipe object.
#' @param step Step from prepped recipe.
#' @param item Item from prepped recipe.
#' @param enframe Should the step item be enframed?
#'
#' @export
extract_step_item <- function(recipe, step, item, enframe = TRUE) {
d <- recipe$steps[[which(purrr::map_chr(recipe$steps, ~ class(.)[1]) == step)]][[item]]
if (enframe) {
tibble::enframe(d) %>% tidyr::spread(key = 1, value = 2)
} else {
d
}
}
此函数用于反规范化。所以它乘以标准。偏差并加上平均值。
#' Unnormalize variable
#'
#' Unormalizes variable using standard deviation and mean from a recipe object. See \code{?recipes}.
#'
#' @param x Numeric vector to normalize.
#' @param rec Recipe object.
#' @param var Variable name in the recipe object.
#'
#' @export
unnormalize <- function(x, rec, var) {
var_sd <- extract_step_item(rec, "step_normalize", "sds") %>% dplyr::pull(var)
var_mean <- extract_step_item(rec, "step_normalize", "means") %>% dplyr::pull(var)
(x * var_sd) + var_mean
}
所以你应该能够生成预测然后使用:
unnormalize(predictions, prepped_recipe_obj, outcome_var_name)
其中 predictions
是从训练模型生成的预测向量,prepped_recipe_obj
在您的案例中是 rec_reg
,而 outcome_var_name
在中是 product
你的情况。
我正在使用 train
函数训练 neuralnet
并使用 recipes
预处理数据。
是否有任何函数可以根据模型进行预测,然后在其原始范围内重新缩放它们,在我的例子中是 [1, 100]
?
library(caret)
library(recipes)
library(neuralnet)
# Create the dataset - times table
tt <- data.frame(multiplier = rep(1:10, times = 10), multiplicand = rep(1:10, each = 10))
tt <- cbind(tt, data.frame(product = tt$multiplier * tt$multiplicand))
# Splitting
indexes <- createDataPartition(tt$product,
times = 1,
p = 0.7,
list = FALSE)
tt.train <- tt[indexes,]
tt.test <- tt[-indexes,]
# Recipe to pre-process our data
rec_reg <- recipe(product ~ ., data = tt.train) %>%
step_center(all_predictors()) %>% step_scale(all_outcomes()) %>%
step_center(all_outcomes()) %>% step_scale(all_predictors())
# Train
train.control <- trainControl(method = "repeatedcv",
number = 10,
repeats = 3,
savePredictions = TRUE)
tune.grid <- expand.grid(layer1 = 8,
layer2 = 0,
layer3 = 0)
# Setting seed for reproducibility
set.seed(12)
tt.cv <- train(rec_reg,
data = tt.train,
method = 'neuralnet',
tuneGrid = tune.grid,
trControl = train.control,
algorithm = 'backprop',
learningrate = 0.005,
lifesign = 'minimal')
如果你使用step_normalize
而不是step_scale
和step_center
,你可以在recipe
的基础上使用下面的函数来"un-normalize"。 (如果您更喜欢两步标准化,则需要调整 unnormalize
函数。)
该函数用于提取相关步骤。
#' Extract step item
#'
#' Returns extracted step item from prepped recipe.
#'
#' @param recipe Prepped recipe object.
#' @param step Step from prepped recipe.
#' @param item Item from prepped recipe.
#' @param enframe Should the step item be enframed?
#'
#' @export
extract_step_item <- function(recipe, step, item, enframe = TRUE) {
d <- recipe$steps[[which(purrr::map_chr(recipe$steps, ~ class(.)[1]) == step)]][[item]]
if (enframe) {
tibble::enframe(d) %>% tidyr::spread(key = 1, value = 2)
} else {
d
}
}
此函数用于反规范化。所以它乘以标准。偏差并加上平均值。
#' Unnormalize variable
#'
#' Unormalizes variable using standard deviation and mean from a recipe object. See \code{?recipes}.
#'
#' @param x Numeric vector to normalize.
#' @param rec Recipe object.
#' @param var Variable name in the recipe object.
#'
#' @export
unnormalize <- function(x, rec, var) {
var_sd <- extract_step_item(rec, "step_normalize", "sds") %>% dplyr::pull(var)
var_mean <- extract_step_item(rec, "step_normalize", "means") %>% dplyr::pull(var)
(x * var_sd) + var_mean
}
所以你应该能够生成预测然后使用:
unnormalize(predictions, prepped_recipe_obj, outcome_var_name)
其中 predictions
是从训练模型生成的预测向量,prepped_recipe_obj
在您的案例中是 rec_reg
,而 outcome_var_name
在中是 product
你的情况。