从 scipy 中的稀疏矩阵中删除对角线元素
Removing diagonal elements from a sparse matrix in scipy
我想从稀疏矩阵中删除对角线元素。由于矩阵是稀疏的,这些元素一旦被移除就不应该被存储。
Scipy提供了设置对角线元素值的方法:setdiag
如果我尝试使用 lil_matrix,它会起作用:
>>> a = np.ones((2,2))
>>> c = lil_matrix(a)
>>> c.setdiag(0)
>>> c
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in LInked List format>
但是对于 csr_matrix,似乎对角线元素没有从存储中删除:
>>> b = csr_matrix(a)
>>> b
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
>>> b.setdiag(0)
>>> b
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
>>> b.toarray()
array([[ 0., 1.],
[ 1., 0.]])
通过密集数组,我们当然有:
>>> csr_matrix(b.toarray())
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
这是故意的吗?如果是这样,是否是由于 csr 矩阵的压缩格式?除了从稀疏到密集再到稀疏之外,还有其他解决方法吗?
简单地将元素设置为 0 不会改变 csr
矩阵的稀疏性。您必须申请 eliminate_zeros
.
In [807]: a=sparse.csr_matrix(np.ones((2,2)))
In [808]: a
Out[808]:
<2x2 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [809]: a.setdiag(0)
In [810]: a
Out[810]:
<2x2 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [811]: a.eliminate_zeros()
In [812]: a
Out[812]:
<2x2 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
由于更改 csr 矩阵的稀疏度相对昂贵,它们允许您在不更改稀疏度的情况下将值更改为 0。
In [829]: %%timeit a=sparse.csr_matrix(np.ones((1000,1000)))
...: a.setdiag(0)
100 loops, best of 3: 3.86 ms per loop
In [830]: %%timeit a=sparse.csr_matrix(np.ones((1000,1000)))
...: a.setdiag(0)
...: a.eliminate_zeros()
SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
10 loops, best of 3: 133 ms per loop
In [831]: %%timeit a=sparse.lil_matrix(np.ones((1000,1000)))
...: a.setdiag(0)
100 loops, best of 3: 14.1 ms per loop
我想从稀疏矩阵中删除对角线元素。由于矩阵是稀疏的,这些元素一旦被移除就不应该被存储。
Scipy提供了设置对角线元素值的方法:setdiag
如果我尝试使用 lil_matrix,它会起作用:
>>> a = np.ones((2,2))
>>> c = lil_matrix(a)
>>> c.setdiag(0)
>>> c
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in LInked List format>
但是对于 csr_matrix,似乎对角线元素没有从存储中删除:
>>> b = csr_matrix(a)
>>> b
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
>>> b.setdiag(0)
>>> b
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
>>> b.toarray()
array([[ 0., 1.],
[ 1., 0.]])
通过密集数组,我们当然有:
>>> csr_matrix(b.toarray())
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
这是故意的吗?如果是这样,是否是由于 csr 矩阵的压缩格式?除了从稀疏到密集再到稀疏之外,还有其他解决方法吗?
简单地将元素设置为 0 不会改变 csr
矩阵的稀疏性。您必须申请 eliminate_zeros
.
In [807]: a=sparse.csr_matrix(np.ones((2,2)))
In [808]: a
Out[808]:
<2x2 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [809]: a.setdiag(0)
In [810]: a
Out[810]:
<2x2 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [811]: a.eliminate_zeros()
In [812]: a
Out[812]:
<2x2 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
由于更改 csr 矩阵的稀疏度相对昂贵,它们允许您在不更改稀疏度的情况下将值更改为 0。
In [829]: %%timeit a=sparse.csr_matrix(np.ones((1000,1000)))
...: a.setdiag(0)
100 loops, best of 3: 3.86 ms per loop
In [830]: %%timeit a=sparse.csr_matrix(np.ones((1000,1000)))
...: a.setdiag(0)
...: a.eliminate_zeros()
SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
10 loops, best of 3: 133 ms per loop
In [831]: %%timeit a=sparse.lil_matrix(np.ones((1000,1000)))
...: a.setdiag(0)
100 loops, best of 3: 14.1 ms per loop