Scipy:距离相关性大于1

Scipy: distance correlation is higher than 1

我正在寻找 distance correlation between columns, look at the code below. Most of time it returns higher than 1 result, which is not possible, because distance correlation is between 0 and 1. You can read about scipy's distance correlation here

import numpy as np
from scipy.spatial import distance

x = np.random.uniform(-1, 1, 10000)
print distance.correlation(x, x**2)

1.00210811815

这里有什么问题或者我该如何衡量?

upd1: Link to issue github

根据文档,我不明白为什么这是个问题。

来自documentation

The correlation distance between u and v, is defined as 1 - \frac{(u - \bar{u}) \cdot (v - \bar{v})} {{||(u - \bar{u})||}_2 {||(v - \bar{v})||}_2}

根据Cauchy-Schwarz Inequality,减号后面的表达式的绝对值最多为1。没有任何规定它不会是负数,但实际上,如果(均值归一化)向量是反相关的,就会发生这种情况。

AFAICT,如果你得到一个大于 2 或小于 0 的值,你应该感到惊讶。使用@Cleb 的评论和范围是 [0, 2] 的事实,我猜其他一些包只是将距离定义为 half 这个表达式。

@josef-pkt 对 github 的回答如下:

It's not a distance correlation which is a nonlinear measure of dependence. e.g. my take http://jpktd.blogspot.ca/2012/06/non-linear-dependence-measures-distance.html However, "correlation" in scipy.spatial.distance.correlation is a bit misleading because according to the formula in the docstring it's a distance measure and not a correlation. perfectly correlated with correlation coefficient equal to 1 has zero distance perfectly negatively correlated with correlation coefficient equal to -1 has maximal distance at 2.

相关距离是相关性的倒数,仅查看模式之间的 angle/similarity(有点像归一化)。相关距离从 0 到 2,0 是完全相关,1 是不相关,2 是完全反相关。所以一个小的相关距离值意味着在相关 space 中靠近在一起(小 angular 差异)。 Corr = 1 – 距离; 校正距离 = 1 – 校正; 所以虽然高相关性=高关系;低 CORR DISTINANCE = 高关系