1 个向量和许多其他向量之间的快速 PearsonR？

Question

我正在寻找一种更有效的方法来计算长度为 1000 的静态向量与许多其他长度相同的向量之间的 Pearson 系数。

我天真的方法是成对相关：

import numpy as np
from scipy import stats
A = np.random.rand(1,1000)
otherVectors = np.random.rand(700,1000)

for B in otherVectors:
    R,p = stats.pearsonr(A, B)

只是想问问有没有更快的解决办法。

非常感谢！

Answer 1

一次手动计算它们。

def pearsonr_many(x, ys):
    x_mean = x.mean()
    y_means = ys.mean(axis=1)

    xm, yms = x - x_mean, ys - y_means[:, newaxis]
    r = yms @ xm / np.sqrt(xm @ xm * (yms * yms).sum(axis=1))
    r = r.clip(-1, 1)

    prob = special.betainc(
        len(x) / 2 - 1,
        0.5,
        1 / (1 + r * r / (1 - r * r))
    )

    return r, prob

1 个向量和许多其他向量之间的快速 PearsonR？

Fast PearsonR between 1 vector and many others?

python

performance

correlation

python-3.x