模型构建并停留在 R 中的错​​误消息上

Model building and stuck on error message in R

我正在尝试分析下面的数据集

   X wt   solution  process

   1 21        1       1
   2 36        2       1
   3 25        3       1
   4 18        4       1
   5 22        5       1
   6 26        1       2
   7 38        2       2
   8 27        3       2
   9 17        4       2
  10 26        5       2
  11 16        1       3
  12 25        2       3
  13 22        3       3
  14 18        4       3
  15 21        5       3
  16 28        1       4
  17 35        2       4
  18 27        3       4
  19 20        4       4
  20 24        5       4

数据是平衡的,我相信两者的影响都是固定的。我在 R 中的代码如下

str(wool)

##convert solution and process to factors

wool$solution<-as.factor(wool$solution)
wool$process<-as.factor(wool$process)


m1<-aov(wt~solution*process,data=wool)
plot(m1)

但是,当我尝试绘制模型以检查假设时,出现以下错误:

not plotting observations with leverage one:
  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20Error in qqnorm.default(rs, main = main, ylab = ylab23, ylim = ylim, ...) : 
  y is empty or has only NAs

当我在没有交互的情况下绘制模型时,我不确定如何更正此问题

m1<-aov(wt~solution+process,data=wool)

一切正常,但我需要分析交互以查看它是否重要。此外,当我将这些因素保持为数字时,它就起作用了,但这些绝对是分类因素,每个因素都指一种过程和一种用于处理观察结果的解决方案。

感谢任何帮助。

交互效果的模型消耗了19个自由度。方差分析中误差项的自由度为 N - k,其中 N = 观察总数,k = 组效应数。

由于 N 为 20 并且组数加上交互 k 也为 20,因此模型项的自由度为 19。因此,误差项的自由度为 0,杠杆每个观测值的值为 1,这意味着模型中的 19 个参数中的每一个加上总均值都完全依赖于数据框中的 20 个观测值。这解释了返回 plot() 函数的错误消息。

rawData <- "X wt   solution  process

   1 21        1       1
   2 36        2       1
   3 25        3       1
   4 18        4       1
   5 22        5       1
   6 26        1       2
   7 38        2       2
   8 27        3       2
   9 17        4       2
  10 26        5       2
  11 16        1       3
  12 25        2       3
  13 22        3       3
  14 18        4       3
  15 21        5       3
  16 28        1       4
  17 35        2       4
  18 27        3       4
  19 20        4       4
  20 24        5       4"

wool <- read.table(text=rawData,header=TRUE)

str(wool)

##convert solution and process to factors

wool$solution<-as.factor(wool$solution)
wool$process<-as.factor(wool$process)


m1<-aov(wt~solution*process,data=wool)
summary(m1)

...输出:

> m1<-aov(wt~solution*process,data=wool)
> summary(m1)
                 Df Sum Sq Mean Sq
solution          4  500.8  125.20
process           3  136.8   45.60
solution:process 12   87.2    7.27

当我们用 anova(m1) 打印方差分析 table 时,问题就变得很清楚了。

> anova(m1)
Analysis of Variance Table

Response: wt
                 Df Sum Sq Mean Sq F value Pr(>F)
solution          4  500.8 125.200               
process           3  136.8  45.600               
solution:process 12   87.2   7.267               
Residuals         0    0.0                       
Warning message:
In anova.lm(m1) :
  ANOVA F-tests on an essentially perfect fit are unreliable

自由度不足的问题在我们使用lm()拟合模型时更加明显

m2 <- lm(wt ~ solution*process,data=wool)
summary(m2)

...输出:

> m2 <- lm(wt ~ solution*process,data=wool)
> summary(m2)

Call:
lm(formula = wt ~ solution * process, data = wool)

Residuals:
ALL 20 residuals are 0: no residual degrees of freedom!

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
(Intercept)              21         NA      NA       NA
solution2                15         NA      NA       NA
solution3                 4         NA      NA       NA
solution4                -3         NA      NA       NA
solution5                 1         NA      NA       NA
process2                  5         NA      NA       NA
process3                 -5         NA      NA       NA
process4                  7         NA      NA       NA
solution2:process2       -3         NA      NA       NA
solution3:process2       -3         NA      NA       NA
solution4:process2       -6         NA      NA       NA
solution5:process2       -1         NA      NA       NA
solution2:process3       -6         NA      NA       NA
solution3:process3        2         NA      NA       NA
solution4:process3        5         NA      NA       NA
solution5:process3        4         NA      NA       NA
solution2:process4       -8         NA      NA       NA
solution3:process4       -5         NA      NA       NA
solution4:process4       -5         NA      NA       NA
solution5:process4       -5         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 19 and 0 DF,  p-value: NA

与连续变量的交互作用

关于当分析 运行 使用连续变量 lm() 时代码工作的 OP 问题,对于连续变量,交互效应消耗单个自由度,而不是(解决方案 - 1 ) * (processes - 1) 或两个分类变量之间相互作用的 12 个自由度。

同样,我们可以用 lm() 来证明这一点。

wool$solution<-as.numeric(wool$solution)
wool$process<-as.numeric(wool$process)
m3 <- lm(wt ~ solution*process,data=wool)
summary(m3)
anova(m3)

...输出:

> summary(m3)

Call:
lm(formula = wt ~ solution * process, data = wool)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.460  -3.757  -0.180   2.320  12.000 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)   
(Intercept)       28.9000     8.1454   3.548  0.00268 **
solution          -1.5000     2.4559  -0.611  0.54993   
process           -0.0100     2.9743  -0.003  0.99736   
solution:process   0.0300     0.8968   0.033  0.97373   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.341 on 16 degrees of freedom
Multiple R-squared:  0.1123,    Adjusted R-squared:  -0.05409 
F-statistic: 0.675 on 3 and 16 DF,  p-value: 0.5798

> anova(m3)
Analysis of Variance Table

Response: wt
                 Df Sum Sq Mean Sq F value Pr(>F)
solution          1  81.22  81.225  2.0200 0.1744
process           1   0.16   0.160  0.0040 0.9505
solution:process  1   0.05   0.045  0.0011 0.9737
Residuals        16 643.37  40.211               
>