R：boot() 输出的解释

Question

我正在尝试 运行使用自举数据的方差分析 （因为我的数据不是正态分布的）但我真的不知道我是否这样做了正确以及如何理解我的输出。

这就是我想要做的：我在网上和实验室对相同的受试者进行了相同的实验（= 自变量 "testing situation"，具有 2 个因子水平）。在实验中，我将认知负荷作为一个具有 4 个因子水平（称为 "no-back"、"zero-back"、"one-back" 和 "two-back"）的自变量进行操作，并测量反应时间（在ms) 作为因变量。

这意味着我有一个 2 x 4 受试者内设计，将反应时间作为结果变量，想知道是否存在主效应或交互效应。

我尝试做的是：

# write regression function
bootReg <- function(formula, # Formula of the regression
                    data, 
                    indices)
{
  d <- data[indices,]
  fit <- lm(formula, data=d)
  return(coef(fit))  
}

# bootstrap the data 
boot.object <- boot(statistic = bootReg, formula = lm(RT ~ Code + Situation + Block, data = dataframe), data = dataframe, R = 2000)

我的输出如下所示：

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = NBACK_DESCR, statistic = bootReg, R = 2000, formula = lm(NBACK_Median_RT ~ 
    Code + Situation + Block, data = NBACK_DESCR))


Bootstrap Statistics :
        original     bias    std. error
t1*   322.927313 -9.0002985    79.96588
t2*   -12.014833  5.6209447   117.02878
t3*   109.197500  0.8386920   120.86134
t4*   338.548500  1.0563602   123.06327
t5*   212.354750  0.5961423   307.84955
t6*   115.336083  1.0862478    78.74367
t7*   204.884583  0.6035880    94.50454
t8*  -119.986083  2.2980845    72.79074
t9*   -93.026833  3.3750698    79.26258
t10*    0.311750  7.5767305   183.46302
t11*  200.108625 -1.8049229   371.22341
t12*  -53.072917  0.2976762    95.20676
t13*  126.300083  3.3038699   107.50477
t14*   -3.794000  2.6890971    85.11730
t15*   68.130917  0.1380621   109.92370
t16* -144.711750  1.6015020    74.13766
t17*    0.920000  0.8054492    98.44356
t18* -120.711167  0.7836202    78.31914
t19*   10.794083 -0.6042305    98.66546
t20*  519.203600  9.8466741   571.22411
t21*   90.910500 -0.2344282    90.77725
t22*  108.026250  1.1320475    77.27769
t23*   16.168000  0.3672671   126.07834
t24*  284.315333 -2.4115301   287.93144
t25*  198.447917  2.9121272   112.64016
t26*   37.165250  1.5303775    94.42860
t27*  -98.688833  3.0493664    79.98359
t28*   45.922417  2.0774330    74.65226
t29*   -6.227517  3.8654708   166.54048
t30*   50.998118  2.9716901    49.62328
t31*  -23.885188  6.9669819    64.99859
t32*   59.188070 10.5457197    73.22344

有谁知道这意味着什么以及我如何查看我是否具有显着的交互作用或主效应？

我猜 t1* 是原始测试统计数据，而其他 t*s 是自举测试统计数据，但即使这是正确的，也不能真正帮助我理解此输出试图告诉我的内容。

我的想法是计算有多少 t> t1，将其除以样本数量（在本例中为 2000？或 31？）以获得 p 值.我还考虑过对具有不同预测变量组合的不同类型的模型执行此操作，以查看哪些是重要的。那有意义吗？！我真的不知道。另外我想我应该应用更正？

如果有人能帮助我，那就太好了——我是一名本科生，目前正在尝试学习 R 编程，但我完全迷路了！提前致谢！

Answer 1

t1-32 是你的系数。由于您没有提供 data.frame，我使用下面的示例：

library(boot)
set.seed(111)

dataframe = data.frame(RT=rnorm(100),Code=rbinom(100,1,0.5),
Situation=factor(sample(1:3,100,replace=TRUE)),
Block=sample(letters[1:3],100,replace=TRUE))

dataframe$RT[dataframe$Code==1] = dataframe$RT[dataframe$Code==1] + 1

在我们做 bootstrap 之前，如果我们运行线性模型，我们期望输出是这样的：

(Intercept)        Code  Situation2  Situation3      Blockb      Blockc 
-0.02469464  0.78758240 -0.10768677  0.32013080  0.29325885 -0.05753515

你有一个 6 向量，1 个截距 5 个系数.. 因为其中一些是具有 > 2 个水平的因子。现在我们 bootstrap:

# bootstrap the data 
boot.object <- boot(statistic = bootReg, formula = lm(RT ~ Code + Situation + Block, data = dataframe), data = dataframe, R = 2000)

    Call:
boot(data = dataframe, statistic = bootReg, R = 2000, formula = lm(RT ~ 
    Code + Situation + Block, data = dataframe))


Bootstrap Statistics :
       original       bias    std. error
t1* -0.02469464  0.004240916   0.2588214
t2*  0.78758240  0.005184110   0.2130492
t3* -0.10768677  0.002780318   0.2479104
t4*  0.32013080  0.001647815   0.2734137
t5*  0.29325885 -0.005158781   0.2510207
t6* -0.05753515 -0.012981051   0.2718108

当您对原始数据执行运行 lm 时，您可以看到打印输出对应于确切的系数。另一种看待它的方式：

(Intercept)        Code  Situation2  Situation3      Blockb      Blockc 
-0.02469464  0.78758240 -0.10768677  0.32013080  0.29325885 -0.05753515

bootstrapped 值存储在 $t 下，这里你可以看到有 6 列，每个系数一列，每一行是 bootstrap:

head(boot.object$t)
            [,1]      [,2]         [,3]       [,4]        [,5]        [,6]
[1,]  0.37081996 0.3009173  0.307350121  0.2271736 -0.02898838 -0.38958835
[2,] -0.09836689 1.0306144 -0.272608134  0.1617208  0.30521958 -0.09391564
[3,] -0.24588583 0.9835756 -0.416804093  0.1581820  0.28454367  0.24282730
[4,]  0.29403111 0.5777657 -0.283601680 -0.1328344  0.20086620 -0.20614676
[5,] -0.00692040 0.6228231 -0.150136418  0.3648773  0.42969597 -0.07899494
[6,] -0.24859844 0.8226603  0.008036868  0.6543648  0.43781238  0.25347543

你的 bootstrapped 值应该徘徊在你观察到的值附近，这里我绘制了系数“代码”：

hist(boot.object$t[,2],br=50)
abline(v=boot.object$t0[2],col="blue")

根据 bootstrap，您可以估计系数项的标准误差并使用它来构建置信区间。它不用于检验系数不为零的假设。

您混淆了排列测试和 bootstrap。您需要的是在交换标签的地方构建一个类似的测试。您可以查看 this or maybe this by Ben Bolker

等帖子

R：boot() 输出的解释

R: interpretation of boot() output

r

anova

statistics-bootstrap