R中的手动交互图线性回归
Manual interaction plot linear regression in R
我正在尝试使用对数转换的丰度数据(更适合)和其他一些变量来预测在不同月相(因子)期间看到的动物的平均丰度。最好的模型(最低 AIC)结果包括阶段和调查持续时间的相互作用以及云层覆盖(都是连续的):
LMoon<-lm(ln~Phase*Duration+Clouds, data=abund)
summary(LMoon)
Call:
lm(formula = ln ~ Phase * Duration + Clouds, data = abund)
Residuals:
Min 1Q Median 3Q Max
-1.75416 -0.46311 0.09522 0.46591 1.85978
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.382031 0.876865 0.436 0.664125
Phase2 2.130065 1.226305 1.737 0.085851 .
Phase3 1.971060 1.818542 1.084 0.281351
Phase4 0.608043 1.140122 0.533 0.595146
Phase5 4.786674 1.151850 4.156 7.44e-05 ***
Phase6 0.958706 1.046831 0.916 0.362238
Phase7 0.254711 3.425214 0.074 0.940888
Phase8 0.865995 1.043916 0.830 0.409005
Duration 0.069153 0.035407 1.953 0.053952 .
Clouds -0.004259 0.002401 -1.774 0.079494 .
Phase2:Duration -0.087843 0.047818 -1.837 0.069545 .
Phase3:Duration -0.089908 0.069652 -1.291 0.200109
Phase4:Duration -0.005424 0.046675 -0.116 0.907749
Phase5:Duration -0.172016 0.049369 -3.484 0.000768 ***
Phase6:Duration -0.035597 0.041435 -0.859 0.392583
Phase7:Duration 0.024084 0.176773 0.136 0.891939
Phase8:Duration -0.033424 0.042064 -0.795 0.428963
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7721 on 89 degrees of freedom
Multiple R-squared: 0.3368, Adjusted R-squared: 0.2176
F-statistic: 2.825 on 16 and 89 DF, p-value: 0.0009894
现在,由于这种相互作用,我需要制作一个相互作用图(CI 在绘制 lsmeans 时太宽了)。
我尝试使用那里提到的不同功能,但其中 none 有效。
显然我需要手动计算和绘图,我是这样做的:
intercepts <- c(coef(LMoon)["(Intercept)"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase2"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase3"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase4"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase5"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase6"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase7"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase8"])
lines.df <- data.frame(intercepts = intercepts,
slopes = c(coef(LMoon)["Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase2:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase3:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase4:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase5:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase6:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase7:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase8:Duration"]),
Phase2 = levels(abund$Phase))
qplot(x = Duration, y = Sp2, color = Phase, data = abund) +
geom_abline(aes(intercept = intercepts,
slope = slopes,
color = Phase), data = lines.df)
我得到的图是错误的,因为 y 值是原始的真实比例,但线条是基于使用对数转换数据的 lm。
interaction plot abundance, duration, lunar phases
为了反向转换这个,有人告诉我,实际上我最终不会得到直线。
而不是使用 abline(),我应该创建一组例如覆盖持续时间数据范围的 100 个新 x 值,并使用系数计算您的预测 y 值。然后使用 lines() 绘制这些,它应该看起来像一条平滑的曲线。
这就是我迷路的地方。
所以我为调查持续时间范围(最小 15 最大 45 分钟)创建了一组新的 x 值:
dur2 <- seq(from = 15, to = 45, length.out=100)
然后,一旦我得到这些值,我就应该使用我的 LM 的系数为每个 x 值获得预测的 y 值。之后,将 y 值反向转换为原始比例。然后使用 x 值和反向转换的 y 值将线条添加到图中。
现在如何准确得到预测值?我不能使用任何 pred type/function,我都试过了。它只是不适用于我的模型,所以手动是唯一的方法,但我不知道如何...
希望有人能帮我解决这个问题,我已经尝试了几个星期,现在已经绝望了,几乎要放弃了。
干杯!
PS
这里的数据:
> dput(subset(abund, Phase %in% c("Phase1", "Phase2")))
structure(list(Year = integer(0), Date = structure(integer(0), .Label = c("01/08/2009",
"01/08/2016", "02/07/2019", "02/08/2009", "02/08/2012", "02/08/2016",
"02/09/2007", "03/08/2007", "03/08/2009", "03/08/2014", "03/08/2015",
"04/07/2019", "04/08/2009", "04/08/2013", "05/08/2009", "05/08/2014",
"05/08/2015", "06/07/2008", "06/07/2019", "07/08/2009", "08/07/2010",
"09/07/2010", "09/08/2015", "10/08/2009", "11/08/2009", "12/08/2009",
"13/08/2009", "13/08/2014", "14/08/2009", "14/08/2012", "16/07/2006",
"18/07/2009", "18/08/2015", "19/07/2011", "20/08/2009", "21/07/2011",
"21/09/2009", "22/07/2011", "22/07/2016", "22/07/2017", "23/07/2007",
"23/07/2016", "23/07/2017", "24/07/2017", "25/07/2007", "25/07/2010",
"25/07/2017", "25/08/2016", "26/07/2010", "26/07/2011", "27/07/2006",
"27/07/2011", "27/07/2012", "28/07/2016", "29/06/2019", "29/07/2005",
"29/07/2009", "29/07/2010", "29/07/2016", "29/07/2019", "30/07/2005",
"30/07/2007", "30/07/2016", "30/08/2005", "31/07/2005", "31/07/2009",
"31/07/2014", "31/07/2016"), class = "factor"), NrSurvey = integer(0),
Duration = integer(0), Sp2 = integer(0), Phase = structure(integer(0), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), Clouds = integer(0),
Visibility = integer(0), ln = numeric(0)), row.names = integer(0), class = "data.frame")
使用predict
获得预测值。不要手动计算。
使用 expand.grid()
生成一个数据框,其中包含您的 dur2
序列和您想要绘制的值的其他预测变量的所有组合。像这样:
prediction_data = expand.grid(
Duration = dur2,
Phase= unique(abund$Phase),
Clouds = mean(abund$Clouds) # Hold Clouds constant at some value
)
# column names in prediction_data need to match those in the model formula
prediction_data$pred = predict(LMoon, newdata = prediction_data)
prediction_data$pred_orig = exp(prediction_data$pred)
# plot
ggplot(prediction_data, aes(x = Duration, y = pred_orig, color = Phase)) +
geom_line() +
geom_point(data = abund)
类似的东西应该有用。
另一个不错的选择是使用 broom::augment
生成预测。这也可以很容易地给出每个预测点的标准误差和残差。
library(broom)
prediction_data = augment(LMoon, newdata = prediction_data)
我正在尝试使用对数转换的丰度数据(更适合)和其他一些变量来预测在不同月相(因子)期间看到的动物的平均丰度。最好的模型(最低 AIC)结果包括阶段和调查持续时间的相互作用以及云层覆盖(都是连续的):
LMoon<-lm(ln~Phase*Duration+Clouds, data=abund)
summary(LMoon)
Call:
lm(formula = ln ~ Phase * Duration + Clouds, data = abund)
Residuals:
Min 1Q Median 3Q Max
-1.75416 -0.46311 0.09522 0.46591 1.85978
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.382031 0.876865 0.436 0.664125
Phase2 2.130065 1.226305 1.737 0.085851 .
Phase3 1.971060 1.818542 1.084 0.281351
Phase4 0.608043 1.140122 0.533 0.595146
Phase5 4.786674 1.151850 4.156 7.44e-05 ***
Phase6 0.958706 1.046831 0.916 0.362238
Phase7 0.254711 3.425214 0.074 0.940888
Phase8 0.865995 1.043916 0.830 0.409005
Duration 0.069153 0.035407 1.953 0.053952 .
Clouds -0.004259 0.002401 -1.774 0.079494 .
Phase2:Duration -0.087843 0.047818 -1.837 0.069545 .
Phase3:Duration -0.089908 0.069652 -1.291 0.200109
Phase4:Duration -0.005424 0.046675 -0.116 0.907749
Phase5:Duration -0.172016 0.049369 -3.484 0.000768 ***
Phase6:Duration -0.035597 0.041435 -0.859 0.392583
Phase7:Duration 0.024084 0.176773 0.136 0.891939
Phase8:Duration -0.033424 0.042064 -0.795 0.428963
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7721 on 89 degrees of freedom
Multiple R-squared: 0.3368, Adjusted R-squared: 0.2176
F-statistic: 2.825 on 16 and 89 DF, p-value: 0.0009894
现在,由于这种相互作用,我需要制作一个相互作用图(CI 在绘制 lsmeans 时太宽了)。 我尝试使用那里提到的不同功能,但其中 none 有效。 显然我需要手动计算和绘图,我是这样做的:
intercepts <- c(coef(LMoon)["(Intercept)"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase2"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase3"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase4"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase5"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase6"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase7"],
coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase8"])
lines.df <- data.frame(intercepts = intercepts,
slopes = c(coef(LMoon)["Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase2:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase3:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase4:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase5:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase6:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase7:Duration"],
coef(LMoon)["Duration"]+coef(LMoon)["Phase8:Duration"]),
Phase2 = levels(abund$Phase))
qplot(x = Duration, y = Sp2, color = Phase, data = abund) +
geom_abline(aes(intercept = intercepts,
slope = slopes,
color = Phase), data = lines.df)
我得到的图是错误的,因为 y 值是原始的真实比例,但线条是基于使用对数转换数据的 lm。
interaction plot abundance, duration, lunar phases
为了反向转换这个,有人告诉我,实际上我最终不会得到直线。 而不是使用 abline(),我应该创建一组例如覆盖持续时间数据范围的 100 个新 x 值,并使用系数计算您的预测 y 值。然后使用 lines() 绘制这些,它应该看起来像一条平滑的曲线。
这就是我迷路的地方。
所以我为调查持续时间范围(最小 15 最大 45 分钟)创建了一组新的 x 值:
dur2 <- seq(from = 15, to = 45, length.out=100)
然后,一旦我得到这些值,我就应该使用我的 LM 的系数为每个 x 值获得预测的 y 值。之后,将 y 值反向转换为原始比例。然后使用 x 值和反向转换的 y 值将线条添加到图中。
现在如何准确得到预测值?我不能使用任何 pred type/function,我都试过了。它只是不适用于我的模型,所以手动是唯一的方法,但我不知道如何...
希望有人能帮我解决这个问题,我已经尝试了几个星期,现在已经绝望了,几乎要放弃了。
干杯!
PS 这里的数据:
> dput(subset(abund, Phase %in% c("Phase1", "Phase2")))
structure(list(Year = integer(0), Date = structure(integer(0), .Label = c("01/08/2009",
"01/08/2016", "02/07/2019", "02/08/2009", "02/08/2012", "02/08/2016",
"02/09/2007", "03/08/2007", "03/08/2009", "03/08/2014", "03/08/2015",
"04/07/2019", "04/08/2009", "04/08/2013", "05/08/2009", "05/08/2014",
"05/08/2015", "06/07/2008", "06/07/2019", "07/08/2009", "08/07/2010",
"09/07/2010", "09/08/2015", "10/08/2009", "11/08/2009", "12/08/2009",
"13/08/2009", "13/08/2014", "14/08/2009", "14/08/2012", "16/07/2006",
"18/07/2009", "18/08/2015", "19/07/2011", "20/08/2009", "21/07/2011",
"21/09/2009", "22/07/2011", "22/07/2016", "22/07/2017", "23/07/2007",
"23/07/2016", "23/07/2017", "24/07/2017", "25/07/2007", "25/07/2010",
"25/07/2017", "25/08/2016", "26/07/2010", "26/07/2011", "27/07/2006",
"27/07/2011", "27/07/2012", "28/07/2016", "29/06/2019", "29/07/2005",
"29/07/2009", "29/07/2010", "29/07/2016", "29/07/2019", "30/07/2005",
"30/07/2007", "30/07/2016", "30/08/2005", "31/07/2005", "31/07/2009",
"31/07/2014", "31/07/2016"), class = "factor"), NrSurvey = integer(0),
Duration = integer(0), Sp2 = integer(0), Phase = structure(integer(0), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), Clouds = integer(0),
Visibility = integer(0), ln = numeric(0)), row.names = integer(0), class = "data.frame")
使用predict
获得预测值。不要手动计算。
使用 expand.grid()
生成一个数据框,其中包含您的 dur2
序列和您想要绘制的值的其他预测变量的所有组合。像这样:
prediction_data = expand.grid(
Duration = dur2,
Phase= unique(abund$Phase),
Clouds = mean(abund$Clouds) # Hold Clouds constant at some value
)
# column names in prediction_data need to match those in the model formula
prediction_data$pred = predict(LMoon, newdata = prediction_data)
prediction_data$pred_orig = exp(prediction_data$pred)
# plot
ggplot(prediction_data, aes(x = Duration, y = pred_orig, color = Phase)) +
geom_line() +
geom_point(data = abund)
类似的东西应该有用。
另一个不错的选择是使用 broom::augment
生成预测。这也可以很容易地给出每个预测点的标准误差和残差。
library(broom)
prediction_data = augment(LMoon, newdata = prediction_data)