多个稀疏矩阵的逐元素相加
Element-wise addition of multiple sparse matrices
我有一个相同形状的稀疏 CSR 矩阵列表。我想按元素添加它们,以便结果矩阵保持稀疏。
有没有比在这种循环中这样做更好的方法?
a = lil_matrix((5,5)).tocsr()
for m in m_list:
a += m
我也试过这个方法:
a = np.sum(m_list)
但我在某处读到 numpy 函数不应与 scipy 稀疏矩阵混合,对吗?
让我们来做实验:
制作一些矩阵:
In [30]: mlist = [sparse.random(5,5,.2,'csr')*10 for _ in range(3)]
In [32]: mlist = [(sparse.random(5,5,.2,'csr')*10).astype(int) for _ in range(3)
...: ]
In [33]: mlist
Out[33]:
[<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>,
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>,
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>]
In [34]: [m.A for m in mlist]
Out[34]:
[array([[0, 0, 3, 0, 0],
[4, 0, 0, 0, 0],
[0, 9, 0, 0, 0],
[7, 0, 6, 0, 0],
[0, 0, 0, 0, 0]]),
array([[0, 0, 1, 0, 0],
[0, 0, 6, 0, 0],
[8, 0, 0, 0, 0],
[0, 0, 7, 0, 0],
[0, 0, 0, 0, 0]]),
array([[0, 0, 0, 0, 8],
[0, 0, 0, 0, 0],
[7, 0, 8, 0, 0],
[2, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])]
进行显式加法(与循环相同):
In [36]: mlist[0]+mlist[1]+mlist[2]
Out[36]:
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Row format>
In [37]: _.A
Out[37]:
array([[ 0, 0, 4, 0, 8],
[ 4, 0, 6, 0, 0],
[15, 9, 8, 0, 0],
[ 9, 0, 13, 0, 0],
[ 0, 0, 0, 0, 0]])
应用python的“总和”:
In [38]: sum(mlist)
Out[38]:
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Row format>
In [39]: _.A
Out[39]:
array([[ 0, 0, 4, 0, 8],
[ 4, 0, 6, 0, 0],
[15, 9, 8, 0, 0],
[ 9, 0, 13, 0, 0],
[ 0, 0, 0, 0, 0]])
和np.sum
:
In [40]: np.sum(mlist)
Out[40]:
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Row format>
In [41]: _.A
Out[41]:
array([[ 0, 0, 4, 0, 8],
[ 4, 0, 6, 0, 0],
[15, 9, 8, 0, 0],
[ 9, 0, 13, 0, 0],
[ 0, 0, 0, 0, 0]])
两者都有效。 Python sum
只是遍历列表,在它们之间执行 +
。
np.sum
组成一个数组:
In [42]: np.array(mlist)
Out[42]:
array([<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>,
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>,
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>], dtype=object)
但由于这是一个对象 dtype 数组,它也将任务委托给矩阵的 +
方法。
时间相差不大:
In [43]: timeit sum(mlist)
421 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [44]: timeit np.sum(mlist)
391 µs ± 18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [45]: timeit mlist[0]+mlist[1]+mlist[2]
334 µs ± 629 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
但与添加密集数组相比:
In [46]: timeit mlist[0].A+mlist[1].A+mlist[2].A
25.3 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
稀疏矩阵加法不是很有效。该格式更适合矩阵乘法,但即使在那里,稀疏度也需要在 10% 或更少的数量级。我没有针对稀疏性测试加法。
如果您从 coo
样式输入构造了那些 csr
矩阵,您可能会考虑先组合输入。使用 coo
样式输入,对重复条目求和。
只是为了说明这个想法:
def foo(mlist):
data, row, col = [],[],[]
for m in mlist:
mc = m.tocoo()
data.extend(mc.data)
row.extend(mc.row)
col.extend(mc.col)
res = sparse.csr_matrix((data,(row,col)),shape=mc.shape)
return res
In [55]: foo(mlist)
Out[55]:
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 11 stored elements in Compressed Sparse Row format>
In [56]: _.A
Out[56]:
array([[ 0, 0, 4, 0, 8],
[ 4, 0, 6, 0, 0],
[15, 9, 8, 0, 0],
[ 9, 0, 13, 0, 0],
[ 0, 0, 0, 0, 0]])
In [57]: timeit foo(mlist)
738 µs ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
它比 sum
慢,所以我不会追溯。但它仍然是一个需要牢记的选项。
我有一个相同形状的稀疏 CSR 矩阵列表。我想按元素添加它们,以便结果矩阵保持稀疏。
有没有比在这种循环中这样做更好的方法?
a = lil_matrix((5,5)).tocsr()
for m in m_list:
a += m
我也试过这个方法:
a = np.sum(m_list)
但我在某处读到 numpy 函数不应与 scipy 稀疏矩阵混合,对吗?
让我们来做实验:
制作一些矩阵:
In [30]: mlist = [sparse.random(5,5,.2,'csr')*10 for _ in range(3)]
In [32]: mlist = [(sparse.random(5,5,.2,'csr')*10).astype(int) for _ in range(3)
...: ]
In [33]: mlist
Out[33]:
[<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>,
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>,
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>]
In [34]: [m.A for m in mlist]
Out[34]:
[array([[0, 0, 3, 0, 0],
[4, 0, 0, 0, 0],
[0, 9, 0, 0, 0],
[7, 0, 6, 0, 0],
[0, 0, 0, 0, 0]]),
array([[0, 0, 1, 0, 0],
[0, 0, 6, 0, 0],
[8, 0, 0, 0, 0],
[0, 0, 7, 0, 0],
[0, 0, 0, 0, 0]]),
array([[0, 0, 0, 0, 8],
[0, 0, 0, 0, 0],
[7, 0, 8, 0, 0],
[2, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])]
进行显式加法(与循环相同):
In [36]: mlist[0]+mlist[1]+mlist[2]
Out[36]:
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Row format>
In [37]: _.A
Out[37]:
array([[ 0, 0, 4, 0, 8],
[ 4, 0, 6, 0, 0],
[15, 9, 8, 0, 0],
[ 9, 0, 13, 0, 0],
[ 0, 0, 0, 0, 0]])
应用python的“总和”:
In [38]: sum(mlist)
Out[38]:
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Row format>
In [39]: _.A
Out[39]:
array([[ 0, 0, 4, 0, 8],
[ 4, 0, 6, 0, 0],
[15, 9, 8, 0, 0],
[ 9, 0, 13, 0, 0],
[ 0, 0, 0, 0, 0]])
和np.sum
:
In [40]: np.sum(mlist)
Out[40]:
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in Compressed Sparse Row format>
In [41]: _.A
Out[41]:
array([[ 0, 0, 4, 0, 8],
[ 4, 0, 6, 0, 0],
[15, 9, 8, 0, 0],
[ 9, 0, 13, 0, 0],
[ 0, 0, 0, 0, 0]])
两者都有效。 Python sum
只是遍历列表,在它们之间执行 +
。
np.sum
组成一个数组:
In [42]: np.array(mlist)
Out[42]:
array([<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>,
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>,
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>], dtype=object)
但由于这是一个对象 dtype 数组,它也将任务委托给矩阵的 +
方法。
时间相差不大:
In [43]: timeit sum(mlist)
421 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [44]: timeit np.sum(mlist)
391 µs ± 18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [45]: timeit mlist[0]+mlist[1]+mlist[2]
334 µs ± 629 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
但与添加密集数组相比:
In [46]: timeit mlist[0].A+mlist[1].A+mlist[2].A
25.3 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
稀疏矩阵加法不是很有效。该格式更适合矩阵乘法,但即使在那里,稀疏度也需要在 10% 或更少的数量级。我没有针对稀疏性测试加法。
如果您从 coo
样式输入构造了那些 csr
矩阵,您可能会考虑先组合输入。使用 coo
样式输入,对重复条目求和。
只是为了说明这个想法:
def foo(mlist):
data, row, col = [],[],[]
for m in mlist:
mc = m.tocoo()
data.extend(mc.data)
row.extend(mc.row)
col.extend(mc.col)
res = sparse.csr_matrix((data,(row,col)),shape=mc.shape)
return res
In [55]: foo(mlist)
Out[55]:
<5x5 sparse matrix of type '<class 'numpy.int64'>'
with 11 stored elements in Compressed Sparse Row format>
In [56]: _.A
Out[56]:
array([[ 0, 0, 4, 0, 8],
[ 4, 0, 6, 0, 0],
[15, 9, 8, 0, 0],
[ 9, 0, 13, 0, 0],
[ 0, 0, 0, 0, 0]])
In [57]: timeit foo(mlist)
738 µs ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
它比 sum
慢,所以我不会追溯。但它仍然是一个需要牢记的选项。