为列中的每个组创建线性回归模型

Question

我参考了这个答案：

我正在尝试使用每年的过去值预测 2019 年的“教育”值，使用 lm(Education ~ poly(TIME,2))。

但是，我必须将这个名为 function(TIME) 的 lm 应用到每个“LOCATION”，我能够为 m 中的每个 LOCATION 创建单独的 lm。

根据附件 link 中的答案，我能够运行我的代码直到 my_predict。当我运行 sapply 时，出现错误 Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "list"

有人可以告诉我我的错误吗？我将非常感谢任何帮助。


linear_model <- function(TIME) lm(Education ~ poly(TIME,2), data=table2)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(TIME) predict(m,new_df)

sapply(m,my_predict)   #error here

编辑：

我现在能够预测 2020 年和 2021 年每个“位置”的教育价值，如下所示。

linear_model <- function(x) lm(Education ~ TIME, x)
m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
new_df <- data.frame(TIME=c(2020, 2021), row.names = c ("2020.Education", "2021.Education"))
my_predict <- function(x) predict(x,new_df)
result <- sapply(m,my_predict)

然而，我实际上希望对更多自变量（例如教育、GDP、工作时间、PPI 等）执行此操作，如我的专栏 header:

所示

有人可以建议我如何为我的代码创建循环以创建具有预测值的数据框吗？挣扎了那么多小时都没能做到。

Answer 1

您正在寻找这样的解决方案吗？

library(tidyverse)
library(broom)
df %>% 
  mutate(LOCATION = as_factor(LOCATION)) %>% 
  group_by(LOCATION) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(Education ~ TIME, data = df) %>% 
      glance() %>% 
      add_column(LOCATION = unique(df$LOCATION), .before=1)
  })

  LOCATION r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>        <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 AUT         0.367         0.261   4.88     3.47    0.112     1  -22.9  51.8  52.0    143.            6     8
2 BEL         0.0225       -0.173   3.90     0.115   0.748     1  -18.3  42.6  42.4     76.0           5     7
3 CZE         0.0843       -0.0683  3.22     0.552   0.485     1  -19.6  45.1  45.3     62.2           6     8

Answer 2

你的函数语法有一些错误。函数通常写成 function(x)，然后将 x 替换为您要使用的数据。

例如，在你定义的linear_model函数中，如果你单独使用它，你会这样写：

linear_model(data)

但是，因为您在 lapply 函数中使用它，所以看起来有点棘手。 Lapply 只是循环并将 linear_model 函数应用于从 split(table2,table2$LOCATION).

获得的每个数据帧

同样的事情发生在 my_predict。

无论如何，这应该适合你：

linear_model <- function(x) lm(Education ~ TIME, x)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(x) predict(x,new_df)

sapply(m,my_predict)

对编辑的回答

可能有更有效的循环预测方法，但这是我的方法：

pred_data <- list()

for (i in 3:6){
   linear_model <- function(x) lm(x[,i] ~ TIME, x)
   m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
   new_df <- data.frame(TIME=c(2020, 2021), row.names = c("2020", "2021"))
   my_predict <- function(x) predict(x,new_df)
   pred_data[[colnames(tableLinR)[i]]] <- sapply(m,my_predict)
 }

 pred_data <- melt(pred_data)
 pred_data <- as.data.frame(pivot_wider(pred_data, names_from = L1, values_from = value))

首先创建一个空列表，您将在其中保存循环的输出。在 for (i in 3:4) 中，您放置了要从中进行预测的列的间隔。结果 pred_data 是一个列表，您可以用不同的方式将其转换为数据框。使用 melt 和 pivot_wider，您将获得类似于原始数据的格式。

为列中的每个组创建线性回归模型

Creating a linear regression model for each group in a column

r

linear-regression