嵌套 dataframe/tibble 的多元线性回归模型

Multiple linear regression models for a nested dataframe/tibble

我正在尝试 运行 对嵌套数据框进行多元线性回归。 我有这个数据样本:


            data.frame(Subcat,Date, COMM1, COMM2,UOM, AUC_TYPE, WINNING_PRICE
                        #--|----------|-----|-----|----|---------|-------|
                        1, 2017-03-07, 40750,41400,"MT","English",35000
                        1, 2017-03-15, 40750,40000,"MT","English",35600

                        2, 2017-10-16, 41000,40500,"METER","Yankee",56440
                        2, 2017-11-06, 41010,40510,"METER","Yankee",52000
                        2, 2019-01-26, 50010,50510,"METER","English",50000

                        3, 2017-03-07, 40750,41400,"MT","English",56900
                        3, 2018-05-26, 50010,50510,"MT","English",47000
                        3, 2019-01-21, 40750,40200,"MT","English",56000
                        3, 2019-01-21, 40750,40200,"MT","English",55900

                        4, 2017-11-08, 37500,39000,"LTR","Dynamic Sealbid",67000
                        4, 2017-11-08, 37500,39000,"LTR","Dynamic Sealbid",65900)

Factors/Character 变量已转换为虚拟变量,然后在子类别的基础上进行嵌套。

                    df2= df[,-2] %>% group_by(Subcat)%>%  nest()

输出是一个带有子目录和数据列的嵌套数据框。 我正在尝试 运行 使用以下代码预测每个子类别的获胜价格的回归模型:

   df2= df[,-2] %>%  group_by(Subcat)%>%  nest() %>%  
      mutate(fit=map(data, ~ lm(WINNING_PRICE~.,data = .)),
         results=map(fit,augment)) %>%
  unnest()

显示错误输出错误:输入必须是向量列表 另外: 警告信息: 现在需要 cols。 请使用 cols = c(data, fit, results)。此外,数据帧 df2 未显示在控制台中。

我已引用此查询“

提前致谢!

我认为这应该有效:

model_fn <- function(df1){ 
  lm(WINNING_PRICE ~ AUC_TYPE, data = df1)
}

fitted_bestel <- df2 %>%
   mutate(fit = map(data, model_fn))

错误来自您使用的两个点(一个替代所有协变量,一个替代数据)。

如果你想建模 WINNING_Price ~ Subcat 我不认为我们必须嵌套(第一个例子)。如果需要在 'data' 列中嵌套和拟合模型,两个模型元素都应位于嵌套数据帧 WINNING_PRICE ~ COMM1 中。以下是每种情况的两个示例: unnest() 错误也来自更改以指定要使用 'cols = ' 参数取消嵌套的列。

library(tidyverse)
df <- tribble(~Subcat, ~Date, ~COMM1, ~COMM2, ~UOM, ~AUC_TYPE, ~WINNING_PRICE,
                #--|----------|-----|-----|----|---------|-------|
                1, 2017-03-07, 40750,41400,"MT","English",35000,
                1, 2017-03-15, 40750,40000,"MT","English",35600,

                2, 2017-10-16, 41000,40500,"METER","Yankee",56440,
                2, 2017-11-06, 41010,40510,"METER","Yankee",52000,
                2, 2019-01-26, 50010,50510,"METER","English",50000,

                3, 2017-03-07, 40750,41400,"MT","English",56900,
                3, 2018-05-26, 50010,50510,"MT","English",47000,
                3, 2019-01-21, 40750,40200,"MT","English",56000,
                3, 2019-01-21, 40750,40200,"MT","English",55900,

                4, 2017-11-08, 37500,39000,"LTR","Dynamic Sealbid",67000,
                4, 2017-11-08, 37500,39000,"LTR","Dynamic Sealbid",65900)

fit <- lm(WINNING_PRICE ~ Subcat, data = df)

plot(df$Subcat, y = df$WINNING_PRICE)
abline(fit)

#to fit many model to data with 'data' next column  
df2= df[,-2] %>% group_by(Subcat)%>%  nest()

df3 <- df2 %>% 
  mutate(fit = map(data, ~lm(WINNING_PRICE~COMM1, data = .)),
         results = map(fit, broom::augment))
#need to specify cols to unnest (this was changed recentlyish)
df4 <- df3 %>% unnest(cols = data)