如何使用 R 中的 lapply 多次 运行 具有来自不同数据帧的变量的模型

How to run a model with variables from different dataframes multiple times with lapply in R

我有 2 个数据帧

#dummy df for examples:
set.seed(1)
df1 <- data.frame(t = (1:16), 
                 A = sample(20, 16),
                 B = sample(30, 16),
                 C = sample(30, 16))

df2 <- data.frame(t = (1:16),
                  A = sample(20, 16),
                  B = sample(30, 16),
                  C = sample(30, 16))

我想对两个数据框中的每一列都执行此操作(t 列除外):

model <- lm(df2$A ~ df1$A, data = NULL)

我试过这样的方法:

model <- function(yvar, xvar){
  lm(df1$as.name(yvar) ~ df2$as.name(xvar), data = NULL)
}
lapply(names(data), model)

但显然不行。我做错了什么?

最后,我真正想要的是从模型中获取系数和其他内容。但是阻止我的是如何多次 运行 一个包含来自不同数据帧的变量的线性模型。

我想要的输出我猜它应该看起来像这样:

# [[1]]
# Call:
#   lm(df1$as.name(yvar) ~ df2$as.name(xvar), data = NULL)
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -0.8809 -0.2318  0.1657  0.3787  0.5533 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)    -0.013981   0.169805  -0.082    0.936    
# predmodex[, 2]  1.000143   0.002357 424.351   <2e-16 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.4584 on 14 degrees of freedom
# Multiple R-squared:  0.9999,  Adjusted R-squared:  0.9999 
# F-statistic: 1.801e+05 on 1 and 14 DF,  p-value: < 2.2e-16
# 
# [[2]]
# Call:
#   lm(df1$as.name(yvar) ~ df2$as.name(xvar), data = NULL)
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -0.8809 -0.2318  0.1657  0.3787  0.5533 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)    -0.013981   0.169805  -0.082    0.936    
# predmodex[, 2]  1.000143   0.002357 424.351   <2e-16 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.4584 on 14 degrees of freedom
# Multiple R-squared:  0.9999,  Adjusted R-squared:  0.9999 
# F-statistic: 1.801e+05 on 1 and 14 DF,  p-value: < 2.2e-16
# 
# [[3]]
# Call:
#   lm(df1$as.name(yvar) ~ df2$as.name(xvar), data = NULL)
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -0.8809 -0.2318  0.1657  0.3787  0.5533 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)    -0.013981   0.169805  -0.082    0.936    
# predmodex[, 2]  1.000143   0.002357 424.351   <2e-16 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.4584 on 14 degrees of freedom
# Multiple R-squared:  0.9999,  Adjusted R-squared:  0.9999 
# F-statistic: 1.801e+05 on 1 and 14 DF,  p-value: < 2.2e-16

由于 df1df2 具有相同的名称,您可以这样做:

model <- function(var){
  lm(df1[[var]] ~ df2[[var]])
}
result <- lapply(names(df1)[-1], model)
result

#[[1]]

#Call:
#lm(formula = df1[[var]] ~ df2[[var]])

#Coefficients:
#(Intercept)   df2[[var]]  
#    15.1504      -0.4763  


#[[2]]

#Call:
#lm(formula = df1[[var]] ~ df2[[var]])

#Coefficients:
#(Intercept)   df2[[var]]  
#     3.0227       0.6374  


#[[3]]

#Call:
#lm(formula = df1[[var]] ~ df2[[var]])

#Coefficients:
#(Intercept)   df2[[var]]  
#    15.4240       0.2411  

要从模型中获取汇总统计信息,您可以使用 broom::tidy :

purrr::map_df(result, broom::tidy, .id = 'model_num')

#  model_num term        estimate std.error statistic  p.value
#  <chr>     <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#1 1         (Intercept)   15.2       3.03      5.00  0.000194
#2 1         df2[[var]]    -0.476     0.248    -1.92  0.0754  
#3 2         (Intercept)    3.02      4.09      0.739 0.472   
#4 2         df2[[var]]     0.637     0.227     2.81  0.0139  
#5 3         (Intercept)   15.4       4.40      3.50  0.00351 
#6 3         df2[[var]]     0.241     0.272     0.888 0.390