Return R 中 Bartlett 方差同质性检验的方差

Return variance from Bartlett's test of homogeneity of variance in R

Bartlett 检验允许您检验方差在不同组中是否相同。

R 中的 stats 包具有函数 bartlett.test。这是一个使用 R 中可用的数据集的示例。

bt <- bartlett.test(count ~ spray, data = InsectSprays)

如何从 bartlett.test 得到实际方差?

我好像在对象中找不到这个 bt

names(bt)
[1] "data.name" "method"    "p.value"   "parameter" "statistic"

您可以使用 var() 自行计算方差。一种方法是使用 summaryBy

library(doBy)
summaryBy(count~spray, data=InsectSprays, FUN=var)

但是,您会希望 bartlett.test 提供每组的方差。类似地,在 R 中计算 t.test 也会给出每组的平均值。那么,我们可以从 R 中的 bartlett.test 中提取每组的方差吗?

不,遗憾的是你不能。

方差没有出现在返回对象的结构中。

我阅读了该函数的源代码,您可以通过稍微重写它来提取它,但这比执行您已有的解决方案要多得多。

你可以明白我的意思:

#  File src/library/stats/R/bartlett.test.R
#  Part of the R package, https://www.R-project.org
#
#  Copyright (C) 1995-2015 The R Core Team
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  A copy of the GNU General Public License is available at
#  https://www.R-project.org/Licenses/

bartlett.test <- function(x, ...) UseMethod("bartlett.test")

bartlett.test.default <-
function(x, g, ...)
{
    LM <- FALSE
    if (is.list(x)) {
        if (length(x) < 2L)
            stop("'x' must be a list with at least 2 elements")
        DNAME <- deparse(substitute(x))
        if (all(sapply(x, function(obj) inherits(obj, "lm"))))
            LM <- TRUE
        else
            x <- lapply(x, function(x) x <- x[is.finite(x)])
        k <- length(x)
    }
    else {
        if (length(x) != length(g))
            stop("'x' and 'g' must have the same length")
        DNAME <- paste(deparse(substitute(x)), "and",
                       deparse(substitute(g)))
        OK <- complete.cases(x, g)
        x <- x[OK]
        g <- factor(g[OK])
        k <- nlevels(g)
        if (k < 2)
            stop("all observations are in the same group")
        x <- split(x, g)
    }

    if (LM) {
        n <- sapply(x, function(obj) obj$df.resid)
        v <- sapply(x, function(obj) sum(obj$residuals^2))
    } else {
        n <- sapply(x, "length") - 1
        if (any(n <= 0))
            stop("there must be at least 2 observations in each group")
        v <- sapply(x, "var")
    }

    n.total <- sum(n)
    v.total <- sum(n * v) / n.total
    STATISTIC <- ((n.total * log(v.total) - sum(n * log(v))) /
                  (1 + (sum(1 / n) - 1 / n.total) / (3 * (k - 1))))
    PARAMETER <- k - 1
    PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
    names(STATISTIC) <- "Bartlett's K-squared"
    names(PARAMETER) <- "df"

    RVAL <- list(statistic = STATISTIC,
                 parameter = PARAMETER,
                 p.value = PVAL,
                 data.name = DNAME,
                 method = "Bartlett test of homogeneity of variances")
    class(RVAL) <- "htest"
    return(RVAL)
}

bartlett.test.formula <-
function(formula, data, subset, na.action, ...)
{
    if(missing(formula) || (length(formula) != 3L))
        stop("'formula' missing or incorrect")
    m <- match.call(expand.dots = FALSE)
    if(is.matrix(eval(m$data, parent.frame())))
        m$data <- as.data.frame(data)
    ## need stats:: for non-standard evaluation
    m[[1L]] <- quote(stats::model.frame)
    mf <- eval(m, parent.frame())
    if(length(mf) != 2L)
        stop("'formula' should be of the form response ~ group")
    DNAME <- paste(names(mf), collapse = " by ")
    names(mf) <- NULL
    y <- do.call("bartlett.test", as.list(mf))
    y$data.name <- DNAME
    y
}

稍微 比 Hack-R 建议的更容易破解,但他们是对的,像 sapply(split(InsectSprays,spray),function(x) var(x$count))(在 base R 中完成这一切)可能是更容易。

这里展示的技术是脆弱的,因为它依赖于内置函数的确切形式;如果在 R 的未来版本中对函数进行了微小的更改,它将停止工作。更安全的是 dump() 整个函数并根据您的喜好修改它,然后 source() 结果。

bb <- stats:::bartlett.test.default
bb2 <- body(bb)
## add a line to save the variances
bb2[[12]] <- quote(ESTIMATE <- v)
## add the variances to the return list
bb2[[13]] <- quote(RVAL <- list(statistic = STATISTIC, parameter = PARAMETER, estimate = ESTIMATE, p.value = PVAL, data.name = DNAME, method = "Bartlett test of homogeneity of variances"))
## restore the rest of the function
bb2[14:15] <- body(bb)[13:14]
body(bb) <- bb2

现在把它放回 stats 命名空间中:

assignInNamespace("bartlett.test.default",bb,pos="package:stats")

测试:

(bt <- bartlett.test(count~spray,data=InsectSprays))
##  Bartlett test of homogeneity of variances
## 
## data:  count by spray
## Bartlett's K-squared = 25.96, df = 5, p-value = 9.085e-05
## sample estimates:
##         A         B         C         D         E         F 
## 22.272727 18.242424  3.901515  6.265152  3.000000 38.606061 

您可以通过 bt$estimate 检索值。

可能值得在 r-devel 上提出这个作为对 bartlett.test 的增强:我能想到的唯一反驳是如果有很多组被测试,输出会很笨重.