逐行计算矩阵相关性的快速方法

Question

我有两个非常大的矩阵 (228453x460)，我想计算行之间的相关性。

for i=1:228453
    if(vec1_preprocess(i,1))
        for j=1:228453
            df = effdf(vec1_preprocess(i,:)',vec2_preprocess(j,:)');
            corr_temp = corr(vec1_preprocess(i,:)', vec2_preprocess(j,:)');
            p = calculate_p(corr_temp, df);
            temp = (meanVec(i)+p)/2;
            meanVec(i) = temp;
        end
        disp(i);
    end
end

这需要大约 1 天的时间。有没有直接的方法来计算这个？

编辑： effdf

的代码

function df = effdf(ts1,ts2);
%function df = effdf(ts1,ts2);

    ts1=ts1-mean(ts1);
    ts2=ts2-mean(ts2);
    N=length(ts1);

    ac1=xcorr(ts1); 
    ac1=ac1/max(ac1); % normalized autocorrelation
    ac1=ac1(((length(ac1)+3)/2):((length(ac1)+3)/2+floor(N/4)));

    ac2=xcorr(ts2); 
    ac2=ac2/max(ac2); % normalized autocorrelation 
    ac2=ac2(((length(ac2)+3)/2):((length(ac2)+3)/2+floor(N/4)));

    df = 1/((1/N)+(2/N)*sum(((N-(1:length(ac1)))/N)'.*ac1.*ac2));

Answer 1

如果您阅读 documentation，您会发现 corr 计算列之间的相关性，而不是行之间的相关性。

要将行转换为列，将列转换为行，只需转置矩阵：

tmp1 = vec1_preprocess';
tmp2 = vec2_preprocess';
C = corr(tmp1,tmp2);

Answer 2

由于您没有 post 代码，我假设您的自定义函数 calculate_p 和 effdf 已完美优化，并不代表脚本的瓶颈。让我们专注于我们拥有的。

我看到的第一个问题是：

if (vec1_preprocess(i,1))

检查 228453 次迭代可以显着增加运行时间。因此，仅提取第一列中不包含 0 的矩阵行，并对这些行执行计算：

idx = vec1_preprocess(:,1) ~= 0;
vec1_preprocess = vec1_preprocess(idx,:);

for i = 1:size(vec1_preprocess,1)
    % ...
end

第二个问题是corr。看起来您也在计算 p 值，使用 calculate_p。为什么不使用函数返回的内置 p 值作为第二个输出参数？

[c,p] = corr(A,B);

或者，如果 Pearson 相关性正是您要寻找的，您可以将 corr 替换为 corrcoef 以查看它是否产生更好的性能。

最后但并非最不重要的一点（事实上这是最重要的事情）：为什么要逐行执行此计算而不是对整个矩阵运行执行此计算有任何原因吗？

逐行计算矩阵相关性的快速方法

Fast way to compute row by row matrix correlation

matlab

matrix

correlation