在 R 中自动创建具有不同数据循环的模型

Question

我需要运行在同一个模型上进行大量复制，但在每次迭代中将不同的数据循环到其中。

例如

db1 <- mtcars
db2 <- mtcars
db3 <- mtcars

for(i in 1:db) {
  # keep model structure but alternate the data
  lm(mpg ~ wt, data = db[i])
}

我需要创建一个 for 循环或一个函数，可以运行 db1 上的模型，然后交换 db2 和运行相同的模型。我还需要将它们作为单独的对象存储在我的 R 环境中，例如lm1（对于 db1）和 lm2（对于 db2）

CN 有人请帮我自动化这个。

谢谢

Answer 1

我用来做这样的事情的方法是在数据帧列表上使用地图函数。我的首选方法是使用嵌套数据框，其中我们有一个数据框名称列，数据框，我们添加一个线性模型列。

我在下面使用 map 函数编写了一个版本，该函数采用我们的数据帧向量并将 lm 应用于每个条目。

library(tidyverse)

db1 <- mtcars
db2 <- mtcars
db3 <- mtcars

# Place dataframes in a liset (note do not use c() to put dfs into an array)
a <- list(db1, db2 , db3)

# Construct our dataframe
df <- tibble(entry = 1:3, dataframes = a)

df %>% 
  # Map the lm function to all of the dataframes
  mutate(lm = map(dataframes, ~lm(mpg~wt, data = .x)))
#> # A tibble: 3 x 3
#>   entry dataframes          lm    
#>   <int> <list>              <list>
#> 1     1 <df[,11] [32 x 11]> <lm>  
#> 2     2 <df[,11] [32 x 11]> <lm>  
#> 3     3 <df[,11] [32 x 11]> <lm>

^{由 reprex package (v2.0.0)}

于 2021-04-06 创建

仅使用列表的更直观的方法如下：

（请注意一些信息，即对 lm 的调用丢失）

library(tidyverse)

db1 <- mtcars
db2 <- mtcars
db3 <- mtcars

a <- list(db1, db2 , db3)

b <- rep(list(), 3)

for(i in 1:3) {
  b[i] <- lm(mpg~wt, data = a[[i]])
}
#> Warning in b[i] <- lm(mpg ~ wt, data = a[[i]]): number of items to replace is
#> not a multiple of replacement length
b
#> [[1]]
#> (Intercept)          wt 
#>   37.285126   -5.344472 
#> 
#> [[2]]
#> (Intercept)          wt 
#>   37.285126   -5.344472 
#> 
#> [[3]]
#> (Intercept)          wt 
#>   37.285126   -5.344472

^{由 reprex package (v2.0.0)}

于 2021-04-06 创建

Answer 2

创建一个数据帧列表而不是单个数据帧作为对象，因为循环 db1、db2、db3 更难，而不是创建更容易在列表内循环的数据帧。这里创建的 dfs 基本上是您可以在其上创建模型的数据帧列表。现在我用 mtcars 创建了随机数据集，在你的情况下，你可能已经将数据集保存为 db1、db2 或 db3，因此你可以执行以下任一操作：

a) dfs = list(db1, db2, db3) 将此 dfs 与 lapply 一起使用，如下所示：mymodels <- lapply(dfs, function(x)lm(mpg ~ wt, data=x))

b) dfs <- mget(ls(pattern='^db\d+'), envir = globalenv()) ，在 pattern 里面放置你的数据模式，在这种情况下它以 db word 开头并以数字结尾，现在使用类似上面的 lapply ：mymodels <- lapply(dfs, function(x)lm(mpg ~ wt, data=x))

我已经给出了一个来自 mtcars 数据的例子，使用随机选择的行来提出一种方法。

# Creating a list of data-frames randomly
# Using replicate function n(3) times here and picking 80% of data randomly, using seed value 1 for reproducibility

set.seed(1)
n <- 3
prop = .8

dfs <- lapply(data.frame(replicate(n, sample(1:nrow(mtcars), prop*nrow(mtcars)))), function(x)mtcars[x,])
## replicate function here replicates sample command n number of times and create a matrix of indexs of rows taken as different data points from mtcars dataset

mymodels <- lapply(dfs, function(x)lm(mpg ~ wt, data=x)) #mymodels is your output

输出:

$X1

Call:
lm(formula = mpg ~ wt, data = x)

Coefficients:
(Intercept)           wt  
  38.912167    -5.874795  


$X2

Call:
lm(formula = mpg ~ wt, data = x)

Coefficients:
(Intercept)           wt  
  37.740419    -5.519547  


$X3

Call:
lm(formula = mpg ~ wt, data = x)

Coefficients:
(Intercept)           wt  
  39.463332    -6.051852

在 R 中自动创建具有不同数据循环的模型

automate repeating models with different data forloop in R

iteration

automation

for-loop

r

function