使用带有字符列表的数据框列作为for循环中函数的变量名称将结果放入变量中

Using data frame column with character list as name of variables for function in for loop putting results in variable

我想知道如何使用带有字符列表的数据框列作为 for 循环中函数的变量名称,将结果放入变量中。

我想使用 mtcars 数据集变量 mpgdratdisp 作为 DV,然后从中生成均值。

我创建了一个数据框,其中只有这些名称作为一列:

mtcars_DVs <- data.frame(c("mpg", "drat", "disp"))
names(mtcars_DVs)[names(mtcars_DVs) == "c..mpg....drat....disp.."] <- "Variable_name"
mtcars_DVs$Variable_name <- as.character(mtcars_DVs$Variable_name)

我想在名为 Variable_means 中创建一个列,这些 DV 使用 for 循环引用 mtcars_DVs$Variable_name 作为用于创建平均值的对象的名称。

此输出是 1 个不起作用的方法:

> mtcars_DVs$Variable_means <- 
+   for (DV_col in mtcars_DVs$Variable_name) 
+   {
+     (mean(mtcars$DV_col))
+   }
Warning messages:
1: In mean.default(mtcars$DV_col) :
  argument is not numeric or logical: returning NA
2: In mean.default(mtcars$DV_col) :
  argument is not numeric or logical: returning NA
3: In mean.default(mtcars$DV_col) :
  argument is not numeric or logical: returning NA

此输出是另一种不起作用的方法:

> mtcars_DVs$Variable_means <- 
+   for (DV_col in mtcars_DVs$Variable_name) 
+   {
+     (mean(as.name(mtcars$DV_col)))
+   }
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'mean': invalid type/length (symbol/0) in vector allocation

另一个不成功的输出:

> mtcars_DVs$Variable_means <- 
+   for (DV_col in mtcars_DVs$Variable_name) 
+   {
+     (mean(get(mtcars$DV_col)))
+   }
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'mean': invalid first argument

我可以做很长的路要走,但这很耗时,而且不是自我引用:

# generates means
> mean(mtcars$mpg)
[1] 20.09062
> mean(mtcars$drat)
[1] 3.596563
> mean(mtcars$disp)
[1] 230.7219

# inputs means
> mtcars_DVs$Variable_means <- c("20.09062", "3.596563", "230.7219")

# displays data
> mtcars_DVs
  Variable_name Variable_means
1           mpg       20.09062
2          drat       3.596563
3          disp       230.7219

请帮忙。我很乐意切换方法。

我们可以遍历 'Variable_name' 列,从数据中提取 ccolumn 作为 vector[[ 并得到 mean

mtcars_DVs$Variable_means <- sapply(mtcars_DVs$Variable_name, 
        function(nm) mean(mtcars[[nm]]))

在对象上使用 $ 从字面上检查列名称“DV_col”,而不是

mtcars[[DV_col]]

此外,for 循环中的赋值将是

mtcars_DVs$Variable_means <- numeric(nrow(mtcars_DVs))
for(i in seq_along(mtcars_DVs$Variable_name)) {
         mtcars_DVs$Variable_means[i] <-  
                 mean(mtcars[[mtcars_DVs$Variable_name[i]]])
 } 

mtcars_DVs
#  Variable_name Variable_means
#1           mpg      20.090625
#2          drat       3.596563
#3          disp     230.721875

此外,还有矢量化colMeans,所以我们可以

mtcars_DVs$Variable_means <- colMeans(mtcars[mtcars_DVs$Variable_name])

或使用dplyr/tidyr

library(dplyr)
library(tidyr)
 mtcars %>% 
   summarise(across(all_of(mtcars_DVs$Variable_name), mean)) %>% 
   pivot_longer(everything(), names_to = 'Variable_name',
         values_to = 'Variable_means')
 # A tibble: 3 x 2
 #  Variable_name Variable_means
 # <chr>                  <dbl>
 #1 mpg                    20.1 
 #2 drat                    3.60
 #3 disp                  231.  

collapse

library(collapse)
qTBL(fmean(get_vars(mtcars, mtcars_DVs$Variable_name)), 
      keep.attr = TRUE, row.names.col = 'Variable_name')
# A tibble: 3 x 2
#  Variable_name      X
#  <chr>          <dbl>
#1 mpg            20.1 
#2 drat            3.60
#3 disp          231.