在 R 中使用食谱后如何规范化(反向变换)变量?

How to unnormalize (backtransform) variable after using recipes in R?

我正在使用 train 函数训练 neuralnet 并使用 recipes 预处理数据。

是否有任何函数可以根据模型进行预测,然后在其原始范围内重新缩放它们,在我的例子中是 [1, 100]

library(caret)
library(recipes)
library(neuralnet)

# Create the dataset - times table 
tt <- data.frame(multiplier = rep(1:10, times = 10), multiplicand = rep(1:10, each = 10))
tt <- cbind(tt, data.frame(product = tt$multiplier * tt$multiplicand))

# Splitting 
indexes <- createDataPartition(tt$product,
                              times = 1,
                              p = 0.7,
                              list = FALSE)
tt.train <- tt[indexes,]
tt.test <- tt[-indexes,]

# Recipe to pre-process our data
rec_reg <- recipe(product ~ ., data = tt.train) %>%
  step_center(all_predictors()) %>% step_scale(all_outcomes()) %>%
  step_center(all_outcomes()) %>% step_scale(all_predictors())

# Train
train.control <- trainControl(method = "repeatedcv",
                              number = 10,
                              repeats = 3,
                              savePredictions = TRUE)

tune.grid <- expand.grid(layer1 = 8,
                         layer2 = 0,
                         layer3 = 0)

# Setting seed for reproducibility
set.seed(12)
tt.cv <- train(rec_reg,
               data = tt.train,
               method = 'neuralnet',
               tuneGrid = tune.grid,
               trControl = train.control,
               algorithm = 'backprop',
               learningrate = 0.005,
               lifesign = 'minimal')

如果你使用step_normalize而不是step_scalestep_center,你可以在recipe的基础上使用下面的函数来"un-normalize"。 (如果您更喜欢两步标准化,则需要调整 unnormalize 函数。)

该函数用于提取相关步骤。

#' Extract step item
#'
#' Returns extracted step item from prepped recipe.
#'
#' @param recipe Prepped recipe object.
#' @param step Step from prepped recipe.
#' @param item Item from prepped recipe.
#' @param enframe Should the step item be enframed?
#'
#' @export
extract_step_item <- function(recipe, step, item, enframe = TRUE) {
  d <- recipe$steps[[which(purrr::map_chr(recipe$steps, ~ class(.)[1]) == step)]][[item]]
  if (enframe) {
    tibble::enframe(d) %>% tidyr::spread(key = 1, value = 2)
  } else {
    d
  }
}

此函数用于反规范化。所以它乘以标准。偏差并加上平均值。

#' Unnormalize variable
#'
#' Unormalizes variable using standard deviation and mean from a recipe object. See \code{?recipes}.
#'
#' @param x Numeric vector to normalize.
#' @param rec Recipe object.
#' @param var Variable name in the recipe object.
#'
#' @export
unnormalize <- function(x, rec, var) {
  var_sd <- extract_step_item(rec, "step_normalize", "sds") %>% dplyr::pull(var)
  var_mean <- extract_step_item(rec, "step_normalize", "means") %>% dplyr::pull(var)

  (x * var_sd) + var_mean
}

所以你应该能够生成预测然后使用:

unnormalize(predictions, prepped_recipe_obj, outcome_var_name)

其中 predictions 是从训练模型生成的预测向量,prepped_recipe_obj 在您的案例中是 rec_reg,而 outcome_var_name 在中是 product你的情况。