多个稀疏矩阵的逐元素相加

Question

我有一个相同形状的稀疏 CSR 矩阵列表。我想按元素添加它们，以便结果矩阵保持稀疏。

有没有比在这种循环中这样做更好的方法？

a = lil_matrix((5,5)).tocsr()

for m in m_list:
    a += m

我也试过这个方法：

a = np.sum(m_list)

但我在某处读到 numpy 函数不应与 scipy 稀疏矩阵混合，对吗？

Answer 1

让我们来做实验：

制作一些矩阵：

In [30]: mlist = [sparse.random(5,5,.2,'csr')*10 for _ in range(3)]
In [32]: mlist = [(sparse.random(5,5,.2,'csr')*10).astype(int) for _ in range(3)
    ...: ]
In [33]: mlist
Out[33]: 
[<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>,
 <5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>,
 <5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>]
In [34]: [m.A for m in mlist]
Out[34]: 
[array([[0, 0, 3, 0, 0],
        [4, 0, 0, 0, 0],
        [0, 9, 0, 0, 0],
        [7, 0, 6, 0, 0],
        [0, 0, 0, 0, 0]]),
 array([[0, 0, 1, 0, 0],
        [0, 0, 6, 0, 0],
        [8, 0, 0, 0, 0],
        [0, 0, 7, 0, 0],
        [0, 0, 0, 0, 0]]),
 array([[0, 0, 0, 0, 8],
        [0, 0, 0, 0, 0],
        [7, 0, 8, 0, 0],
        [2, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]])]

进行显式加法（与循环相同）：

In [36]: mlist[0]+mlist[1]+mlist[2]
Out[36]: 
<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Row format>
In [37]: _.A
Out[37]: 
array([[ 0,  0,  4,  0,  8],
       [ 4,  0,  6,  0,  0],
       [15,  9,  8,  0,  0],
       [ 9,  0, 13,  0,  0],
       [ 0,  0,  0,  0,  0]])

应用python的“总和”：

In [38]: sum(mlist)
Out[38]: 
<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Row format>
In [39]: _.A
Out[39]: 
array([[ 0,  0,  4,  0,  8],
       [ 4,  0,  6,  0,  0],
       [15,  9,  8,  0,  0],
       [ 9,  0, 13,  0,  0],
       [ 0,  0,  0,  0,  0]])

和np.sum：

In [40]: np.sum(mlist)
Out[40]: 
<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Row format>
In [41]: _.A
Out[41]: 
array([[ 0,  0,  4,  0,  8],
       [ 4,  0,  6,  0,  0],
       [15,  9,  8,  0,  0],
       [ 9,  0, 13,  0,  0],
       [ 0,  0,  0,  0,  0]])

两者都有效。 Python sum 只是遍历列表，在它们之间执行 +。

np.sum 组成一个数组：

In [42]: np.array(mlist)
Out[42]: 
array([<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>,
       <5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>,
       <5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in Compressed Sparse Row format>], dtype=object)

但由于这是一个对象 dtype 数组，它也将任务委托给矩阵的 + 方法。

时间相差不大：

In [43]: timeit sum(mlist)
421 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [44]: timeit np.sum(mlist)
391 µs ± 18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [45]: timeit mlist[0]+mlist[1]+mlist[2]
334 µs ± 629 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

但与添加密集数组相比：

In [46]: timeit mlist[0].A+mlist[1].A+mlist[2].A
25.3 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

稀疏矩阵加法不是很有效。该格式更适合矩阵乘法，但即使在那里，稀疏度也需要在 10% 或更少的数量级。我没有针对稀疏性测试加法。

如果您从 coo 样式输入构造了那些 csr 矩阵，您可能会考虑先组合输入。使用 coo 样式输入，对重复条目求和。

只是为了说明这个想法：

def foo(mlist):
    data, row, col = [],[],[]
    for m in mlist:
        mc = m.tocoo()
        data.extend(mc.data)
        row.extend(mc.row)
        col.extend(mc.col)
    res = sparse.csr_matrix((data,(row,col)),shape=mc.shape)
    return res

In [55]: foo(mlist)
Out[55]: 
<5x5 sparse matrix of type '<class 'numpy.int64'>'
    with 11 stored elements in Compressed Sparse Row format>
In [56]: _.A
Out[56]: 
array([[ 0,  0,  4,  0,  8],
       [ 4,  0,  6,  0,  0],
       [15,  9,  8,  0,  0],
       [ 9,  0, 13,  0,  0],
       [ 0,  0,  0,  0,  0]])
In [57]: timeit foo(mlist)
738 µs ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

它比 sum 慢，所以我不会追溯。但它仍然是一个需要牢记的选项。

多个稀疏矩阵的逐元素相加

Element-wise addition of multiple sparse matrices

python

scipy

sparse-matrix

python-3.x

python-3.7