两组之间R中的马哈拉诺比斯距离

mahalanobis distance in R between 2 goups

我有两组,每组有 3 个变量,如下所示:

Group1:
     cost time quality
[1,]   90    4      70
[2,]    4   27      37
[3,]   82    4      17
[4,]   18   41       4

第 2 组:

       cost time quality
[1,]    4   27       4

计算两组马哈拉诺比斯距离的代码如下:

      benchmark<-rbind(c(90,4,70),c(4,27,37),c(82,4,17),c(18,41,4))
           colnames(benchmark)=c('cost','time','quality')
           current=rbind(c(4,27,4))
           colnames(current)=c('cost','time','quality')
    bdm<-as.matrix(benchmark)
    cdm<-as.matrix(current)
  mat1<-matrix(bdm,ncol=ncol(bdm),dimnames=NULL)
        mat2<-matrix(cdm,ncol=ncol(cdm),dimnames=NULL)

        #center Data
        mat1.1<-scale(mat1,center = T,scale = F)
        mat2.1<-scale(mat2,center=T,scale=F)

        #cov Matrix
        mat1.2<-cov(mat1.1,method="pearson")

        mat2.2<-cov(mat2.1,method="pearson")

        #the pooled covariance is calculated using weighted average
        n1<-nrow(mat1)
        n2<-nrow(mat2)
        n3<-n1+n2
        #pooled matrix
        #pooled matrix
        mat3<-((n1/n3)*mat1.2) + ((n2/n3)*mat2.2)

        mat4<-solve(mat3)

        #Mean diff
        mat5<-as.matrix((colMeans(mat1)-colMeans(mat2)))
        #multiply
        mat6<-t(mat5)%*%mat4
        #Mahalanobis distance  
        sqrt(mat6 %*% mat5)

结果为 NA 但是当我在下面输入值 link calculate mahalanobis distance 来计算马氏距离时它显示 group1 和 group2 之间的马氏距离 = 2.4642

此外,我收到的错误消息是:

Error in ((n1/n3) * mat1.2) + ((n2/n3) * mat2.2) : non-conformable arrays

和警告消息:

In colMeans(mat1) - colMeans(mat2) :
  longer object length is not a multiple of shorter object length

我觉得您尝试做的事情一定存在于某个 R 包中。经过相当彻底的搜索,我在包 asbio 中找到了函数 D.sq,它看起来非常接近。此函数需要 2 个矩阵作为输入,因此它不适用于您的示例。我还包括一个修改版本,它接受第二个矩阵的向量。

# Original Function
D.sq <- function (g1, g2) {
    dbar <- as.vector(colMeans(g1) - colMeans(g2))
    S1 <- cov(g1)
    S2 <- cov(g2)
    n1 <- nrow(g1)
    n2 <- nrow(g2)
    V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 - 
        1) * S2)))
    D.sq <- t(dbar) %*% solve(V) %*% dbar
    res <- list()
    res$D.sq <- D.sq
    res$V <- V
    res
}

# Data
g1 <- matrix(c(90, 4, 70, 4, 27, 37, 82, 4, 17, 18, 41, 4), ncol = 3, byrow = TRUE)
g2 <- c(2, 27, 4)

# Function modified to accept a vector for g2 rather than a matrix
D.sq2 <- function (g1, g2) {
    dbar <- as.vector(colMeans(g1) - g2)
    S1 <- cov(g1)
    S2 <- var(g2)
    n1 <- nrow(g1)
    n2 <- length(g2)
    V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 - 
        1) * S2)))
    D.sq <- t(dbar) %*% solve(V) %*% dbar
    res <- list()
    res$D.sq <- D.sq
    res$V <- V
    res
}

但是,这并不能完全给出您期望的答案:D.sq2(g1,g2)$D.sq returns 2.2469.

也许您可以将您原来的 matlab 方法与这些细节进行比较,找出差异的根源。快速浏览一下就会发现不同之处在于 V 中的分母是如何计算的。这也可能是我的错误,但希望这能让你继续。