这两个协方差函数有什么区别?

What is the difference between these two covariance functions?

我想重写这个协方差函数:

 def cov1(a, b):
    a_mean = np.mean(a)
    b_mean = np.mean(b)
    sum = 0
    for i in range(0, a.size):
        sum = ((a[i] - a_mean) * (b[i] - b_mean)) + sum
    return sum/(len(a)-1)

我试着重写求和部分,让它变成 np.sum:

def cov(a, b):
    a_mean = np.mean(a)
    b_mean = np.mean(b)
    for i in range(0, a.size):
        summation = np.sum((a[i] - a_mean) * (b[i] - b_mean))
    return summation/(len(a)-1)

但是当我取两个数组时:

a = np.arange(1,11,1)
b = np.arange(10,21,1)

我尝试了两种不同的功能,得到了不同的答案。函数 cov1 是正确的:

print(cov1(a,b))
print(cov(a,b))

9.166666666666666
2.0

这是为什么?如何修复函数 cov(a,b) 使其与 cov1(a,b) 相同?

您忘记在 cov 中定义 summation 并且您忘记将 summation 添加到新的总和中。 试试这个:

def cov(a, b):
    a_mean = np.mean(a)
    b_mean = np.mean(b)
    # Added summation and assigned 0 to it, like for sum in cov1
    summation = 0
    for i in range(0, a.size):
        # Added + summation here, just like in cov1
        summation = np.sum((a[i] - a_mean) * (b[i] - b_mean)) + summation 
    return summation/(len(a)-1)

首先我认为你的函数应该确保输入的长度相同。

其次,你的函数可以是这样的

def cov(a, b):
    a_mean = np.mean(a)
    b_mean = np.mean(b)
    return ((a - a_mean) * (b - b_mean)).sum() / (len(a) - 1)

三、numpy有cov函数,试试

np.cov(a, b)

这将return变量的协方差矩阵。在你的情况下,你可以只使用 np.cov(a, b)[0, 0]