使用 purrr::map_df() 函数对列表中的变量进行分类

Question

我有一个从多重插补中获得的数据集列表。我现在想在此数据集列表中重新分类一个变量。我试过使用 purrr 的 map 函数，按照下面的代码，我运气不太好。

是否可以实际映射一个使用 purr 对变量进行重新编码和重新编码的函数？

# download pacman package if not installed, otherwise load it
if(!require(pacman)) install.packages(pacman)

# loads relevant packages using the pacman package
pacman::p_load(
  dplyr,       # for pipes and manipulation
  mice )       # for imputation

# make 10 dataset using mice

nhanes_imp <- parlmice(nhanes,
                       m = 10,
                       cluster.seed = 1234)

# mut imputed datasets into a list
nhanes_imp <- nhanes_imp$imp



# create function to categorise chl
chl_funct <- function(x) {
  
  if (x == "0") {
    "0 days"
  } else if (x < 100) {
    "< 100"
  } else if (x >= 100 & x < 150) {
    "100 - 149"
  } else if (x >= 150 & x < 200) {
    "150 - 199"
  } else if (x >= 200) {
    ">= 200"
  }



# use the new function to categorise the chl var

nhanes_imp %>% 
  map_df(.$chl,
         chl_funct)

当我运行代码时，这是我得到的错误：

 <error/rlang_error>
  Can't convert a `data.frame` object to function
Backtrace:
 1. nhanes_imp %>% map_df(.$chl, chl_funct)
 2. purrr::map_df(., .$chl, chl_funct)
 4. purrr:::as_mapper.default(.f, ...)
 5. rlang::as_function(.f)
 6. rlang:::abort_coercion(x, friendly_type("function"))

Answer 1

首先，您应该在函数中使用矢量化版本。这可以用 ifelse 或 case_when 来完成，如果你有更多的类别使用 cut 会更好。

library(dplyr)

chl_funct <- function(x) {
  
  case_when(x == 0 ~ "0 days", 
            x < 100 ~ " < 100", 
            x >= 100 & x < 150 ~ "100 - 149", 
            x >= 150 & x < 200 ~ "150 - 199",
            TRUE ~ ">= 200")
}

然后您可以将此函数应用于 nhanes_imp$chl 中数据集的每一列。

nhanes_imp$chl <- nhanes_imp$chl %>% mutate(across(.fns = chl_funct))

Answer 2

我们可以使用cut

chl_funct <- function(x) {
      cut(x, breaks = c(-Inf, 0, 100, 150, 200, Inf), labels = c('0 days',
       "< 100", "100 - 149", "150 - 199", ">=200"))
}

然后使用

library(dplyr)
nhanes_imp$chl <- nhanes_imp$chl %>%
      mutate(across(everything(), chl_funct))

使用 purrr::map_df() 函数对列表中的变量进行分类

Categorise a variable in a list using the purrr::map_df() function

r

purrr

tidyverse