在（非）对角矩阵中查找非零元素的速度

Question

这个例子发生了什么？

查看完整矩阵，查找操作在非对角矩阵上更快。相反，在对角矩阵上获得稀疏表示更快（这似乎是合理的）。对稀疏矩阵的查找操作几乎相等。

出于好奇，有人可以告诉我这里发生了什么吗？为什么在满矩阵中查找非零元素比在对角矩阵中查找非零元素更快？

printf("Diagonal Mat:\n\n")
A = eye(10000);

printf("Full mat: ")
tic
find(A);
toc

printf("Building sparse representation: ")
tic
As = sparse(A);
toc

printf("Sparse mat: ")
tic
find(As);
toc

printf("\n\nNon-Diagonally flagged Mat:\n\n")
A = A | A; # This removes the "Diagonal Matrix" flag from A

printf("Full mat: ")
tic
find(A);
toc

printf("Building sparse representation: ")
tic
As = sparse(A);
toc

printf("Sparse mat: ")
tic
find(As);
toc

printf("\n\nActually Non-Diagonal Mat:\n\n")
A(:,:) = 0;
A(:,1) = 1;
printf("Full mat: ")
tic
find(A);
toc

printf("Building sparse representation: ")
tic
As = sparse(A);
toc

printf("Sparse mat: ")
tic
find(As);
toc

输出：

Diagonal Mat:

Full mat: Elapsed time is 0.204636 seconds.
Building sparse representation: Elapsed time is 5.19753e-05 seconds.
Sparse mat: Elapsed time is 7.60555e-05 seconds.


Non-Diagonally flagged Mat:

Full mat: Elapsed time is 0.0800331 seconds.
Building sparse representation: Elapsed time is 0.0924602 seconds.
Sparse mat: Elapsed time is 7.48634e-05 seconds.


Actually Non-Diagonal Mat:

Full mat: Elapsed time is 0.0799708 seconds.
Building sparse representation: Elapsed time is 0.092248 seconds.
Sparse mat: Elapsed time is 7.70092e-05 seconds.

Answer 1

它与您的计算机（或详细地说，您的 cpu）如何处理和缓冲堆栈和堆上的值有关。当您在堆上为数组分配内存时，它会一个接一个地分配 "list" 个值。因此，当您逐个值地遍历数组时，cpu 只会从该列表中的一个值跳到下一个值（非常快），但是如果您从值 i 跳到值 i + n，其中n 不是 1，这意味着 cpu 必须在该列表的其他地方找到下一个值。因此，您在编程环境中保存值的方式以及迭代这些值的方式会影响后续过程的速度。

这只是一个简短而非常简单的尝试来解释这个主题。实际上，它更复杂，因为更多的因素和不同的 cpu 和记忆技术是可能的。如果您对这类事情感兴趣，我建议您从这里开始：https://en.wikipedia.org/wiki/System_programming（在该主题上有一个自上而下的视图）。

Answer 2

首先，以下是衡量这一点的更好方法：

for i = 1:10, find (d); endfor
t = cputime ();
for i = 1:100, find (d); endfor
cputime () -t


for i = 1:10, find (f); endfor
t = cputime ();
for i = 1:100, find (f); endfor
cputime () -t

这是个好问题。 Octave 具有对角矩阵的内部特化，其中仅存储对角线值。你可以看到它使用了多少内存：

octave> d = eye (10000);
octave> f = full (eye (10000));
octave> typeinfo (d)
ans = diagonal matrix
octave> typeinfo (f)
ans = matrix
octave> whos d f
Variables in the current scope:

   Attr Name        Size                     Bytes  Class
   ==== ====        ====                     =====  ===== 
        d       10000x10000                  80000  double
        f       10000x10000              800000000  double

Total is 200000000 elements using 800080000 bytes

专门化是为了在对角矩阵常见的情况下减少内存和性能。这种专业化的问题在于它们到处添加特殊情况，特别是当你想直接访问 Octave 经常做的数据时。

情况下使用find, it has special cases for boolean arrays, integer arrays, permutation matrices, and sparse matrices. There is no special handling for diagonal matrices so the the case for real type double precision array代替。这意味着无论如何调用 find 时，对角矩阵都会在内部转换为完整数组。

奇怪的是，在调用 find 之前在对角矩阵上调用 full 似乎仍然更有效率，所以也许我的推理是错误的。我开了一个performance bug report

在（非）对角矩阵中查找非零元素的速度

Speed of finding non-zero elements in (non-)diagonal matrix

octave