在 Matlab 中将 updating/inserting 性能转换为稀疏矩阵？

Question

我已经为测量不确定性的计算写了一个相当大的 class，但是速度慢得令人痛苦。分析代码表明，到目前为止，最慢的操作是将计算结果插入到一个大的稀疏矩阵中。大约 97% 的时间花在该操作上。矩阵保留了所有测量数据的不确定性，我无法在不破坏大量其他代码的情况下更改数据结构。所以我唯一的选择是优化数据插入步骤。这在我的基准测试中完成了大约 5700 次，每次数据量都会增加。

第一个解决方案，非常慢：

% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), numel(obj.w));

% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), numel(obj.w));

obj.w = [obj.w; z(:)]; % insert new measurement results into the column vector obj.w
obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; % insert new uncertainties of the measurement results
obj.dw_dw = [obj.dw_dw, transpose(dz_dw); dz_dw, dz_dz]; % insert "dependencies" of new measurements on old results

obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; 行是迄今为止最慢的。也许它很慢，因为 Matlab 需要为 obj.Uww 分配一个新的更大的缓冲区并将所有内容复制过来。因此我将代码更改为以下内容：

% Preallocation in the class constructor
obj.w = spalloc(nnz_w, 1, nnz_w);
obj.Uww = spalloc(nnz_w, nnz_w, nnz_Uww);
obj.dw_dw = spalloc(nnz_w, nnz_w, nnz_dw_dw);
obj.num_w = 0; % to manually keep track of the "real" size of obj.w, obj.Uww and obj.dw_dw

class 构造函数被调用，其三个属性 w、Uww 和 dw_dw 在计算结束时将具有的大小（nzz_w 约为 10 万，nzz_Uww 约为 900 万，nzz_dw_dw 约为 160 万）。因此，不需要新的内存分配。这是现在的插入步骤：

% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), obj.num_w);

% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), obj.num_w);

wInd = 1:obj.num_w;
obj.w(zInd, 1) = z(:); % insert new measurement results
obj.num_w = zInd(end); % new "real" size of w, Uww and dw_dw
obj.Uww(zInd, wInd) = Uzw; % about 51% of all computation time
obj.Uww(wInd, zInd) = transpose(Uzw); % about 15% of all computation time
obj.Uww(zInd, zInd) = Uzz; % about 14.4% of all computation time
obj.dw_dw(zInd, wInd) = dz_dw; % about 13% of all computation time
obj.dw_dw(wInd, zInd) = transpose(dz_dw); % about 3.5% of all computation time
obj.dw_dw(zInd, zInd) = dz_dz; % less than 3.5% of all computation time

仍然，这些行占所有计算时间的 97%，并且速度没有提高。因此我尝试了第三版：

obj.w = [obj.w; z(:)];

[zInd_zy, yInd_zy] = ndgrid(zInd, yInd(:));
[zzInd_i, zzInd_j] = ndgrid(zInd, zInd);

[Uww_i, Uww_j, Uww_v] = find(obj.Uww); % 14% of all computation time
Uww_new = sparse( ... % this statement takes 66% of all computation time
    [Uww_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
    [Uww_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
    [Uww_v;     Uzy(:);     Uzy(:);     Uzz(:)], ...
    numel(obj.w), numel(obj.w));

[dw_dw_i, dw_dw_j, dw_dw_v] = find(obj.dw_dw);
dw_dw_new = sparse( ... % 14% of all computation time
    [dw_dw_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
    [dw_dw_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
    [dw_dw_v;   dz_dy(:);   dz_dy(:);   dz_dz(:)], ...
    numel(obj.w), numel(obj.w));
obj.Uww = Uww_new;
obj.dw_dw = dw_dw_new;

比其他两个版本还要慢。为什么插入 已经预分配的 数组这么慢？我怎样才能加快速度？

（所有矩阵都是对称的，但我还没有尝试利用它。）

Answer 1

我不了解您的更新模式的细节，但请记住，Matlab 在内部以压缩稀疏列格式存储稀疏矩阵。因此，按顺序逐列添加条目比其他顺序要快得多。例如，在我的旧版 Matlab (R2006a) 上，这个：

n=10000;
nz=400000;
v=floor(n*rand(nz,3))+1;

fprintf('Random\n');
A=sparse(n, n);
tic
for k=1:nz
  A(v(k,1), v(k,2))=v(k,3);
end
toc

fprintf('Row-wise\n');
v=sortrows(v);
A=sparse(n, n);
tic
for k=1:nz
  A(v(k,1), v(k,2))=v(k,3);
end
toc

fprintf('Column-wise\n');
v=sortrows(v, [2 1]);
A=sparse(n, n);
tic
for k=1:nz
  A(v(k,1), v(k,2))=v(k,3);
end
toc

给出这个：

>> sparsetest
Random
Elapsed time is 19.276089 seconds.
Row-wise
Elapsed time is 20.714324 seconds.
Column-wise
Elapsed time is 1.498150 seconds.

可能最好的是，如果你能以某种方式以适合 spconvert 或 sparse 的形式收集非零值，然后在最后制作整个稀疏矩阵，但我认为你可能做不到。

Answer 2

@bg2b 指出如何按列添加数据要快得多。

事实证明，将行添加到稀疏矩阵或稀疏向量非常慢（更准确地说，添加到下三角部分）。因此，我现在只存储稀疏矩阵的上三角部分，因为这样很快。当我需要矩阵中的数据时，我从上三角稀疏矩阵重新创建它。请参阅基准测试脚本的末尾。

这是我的基准测试脚本。它很好地显示了添加数据的计算时间呈指数增长。

% Benchmark the extension of sparse matrices of the form
% Uww_new = [ Uww, Uwz; ...
%             Uzw, Uzz];
% where Uzw = transpose(Uwz). Uww and Uzz are always square and symmetric.
close all;
clearvars;

rng(70101557, 'twister'); % the seed is the number of the stack overflow question
density = 0.25;
nZ = 10;
iterations = 5e2;
n = nZ * iterations;
nonzeros = n*n*density;

all_Uzz = cell(iterations, 1);
all_Uwz = cell(iterations, 1);
for k = 1:iterations
    % Uzz must be symmetric!
    Uzz_nonsymmetric = sprand(nZ, nZ, density);
    all_Uzz{k} = (Uzz_nonsymmetric + transpose(Uzz_nonsymmetric))./2;
    all_Uwz{k} = sprand((k-1)*nZ, nZ, density);
end
f = figure();
ax = axes(f);
hold(ax, 'on');
grid(ax, 'on');
xlabel(ax, 'Iteration');
ylabel(ax, 'Elapsed time in seconds.');
h = gobjects(1, 0);

name = 'Optimised.';
fprintf('%s\n', name);
Uww_optimised = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
    Uzz = all_Uzz{k};
    Uwz = all_Uwz{k};
    zInd = size(Uww_optimised, 1) + (1:size(Uzz, 1));
    wInd = 1:size(Uww_optimised, 1);
    Uzz_triu = triu(Uzz);
    Uww_optimised(wInd, zInd) = Uwz; % add columns
    Uww_optimised(zInd, zInd) = Uzz_triu; % add rows and columns
    elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];

name = 'Only Uwz and Uzz.';
fprintf('%s\n', name);
Uww_wz_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
    Uzz = all_Uzz{k};
    Uwz = all_Uwz{k};
    zInd = size(Uww_wz_zz, 1) + (1:size(Uzz, 1));
    wInd = 1:size(Uww_wz_zz, 1);
    Uzw = transpose(Uwz);
    Uww_wz_zz(wInd, zInd) = Uwz; % add columns
%   Uww_wz_zz(zInd, wInd) = Uzw; % add rows
    Uww_wz_zz(zInd, zInd) = Uzz; % add rows and columns
    elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];

name = 'Only Uzw and Uzz.';
fprintf('%s\n', name);
Uww_zw_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
    Uzz = all_Uzz{k};
    Uwz = all_Uwz{k};
    zInd = size(Uww_zw_zz, 1) + (1:size(Uzz, 1));
    wInd = 1:size(Uww_zw_zz, 1);
    Uzw = transpose(Uwz);
%   Uww_zw_zz(wInd, zInd) = Uwz;
    Uww_zw_zz(zInd, wInd) = Uzw;
    Uww_zw_zz(zInd, zInd) = Uzz;
    elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];

name = 'Uzw, Uwz and Uzz.';
fprintf('%s\n', name);
Uww = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
    Uzz = all_Uzz{k};
    Uwz = all_Uwz{k};
    zInd = size(Uww, 1) + (1:size(Uzz, 1));
    wInd = 1:size(Uww, 1);
    Uzw = transpose(Uwz);
    Uww(wInd, zInd) = Uwz;
    Uww(zInd, wInd) = Uzw;
    Uww(zInd, zInd) = Uzz;
    elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
leg = legend(ax, h, 'Location', 'northwest');

assert(issymmetric(Uww));
assert(istriu(Uww_optimised));
assert(isequal(Uww, Uww_optimised + transpose(triu(Uww_optimised, 1))));

%% Get Uyy from Uww_optimised. Uyy is a symmetric subset of Uww
yInd = randi(size(Uww_optimised, 1), 1, nZ); % indices to extract
[yIndRowInds, yIndColInds] = ndgrid(yInd, yInd);
indsToFlip = yIndRowInds > yIndColInds;
temp = yIndColInds(indsToFlip);
yIndColInds(indsToFlip) = yIndRowInds(indsToFlip);
yIndRowInds(indsToFlip) = temp;
linInd = sub2ind(size(Uww_optimised), yIndRowInds, yIndColInds);
assert(issymmetric(linInd));
Uyy = Uww_optimised(linInd);
assert(issymmetric(Uyy));

在 Matlab 中将 updating/inserting 性能转换为稀疏矩阵？

Performance of updating/inserting into a sparse matrix in Matlab?

performance

matlab

sparse-matrix