稀疏矩阵的划分

Question

我有一个包含 45671x45671 个元素的 scipy.sparse 矩阵。在此矩阵中，某些行仅包含“0”值。

我的问题是，如何将每行值除以行总和。显然，使用 for 循环是可行的，但我正在寻找一种有效的方法...

我已经试过了：

matrix / matrix.sum(1) 但我有 MemoryError 个问题。
matrix / scs.csc_matrix((matrix.sum(axis=1))) 但 ValueError: inconsistent shapes
其他古怪的事情...

此外，我想跳过只有“0”值的行。

所以，如果你有任何解决方案...

提前致谢！

Answer 1

我身边有一个 M：

In [241]: M
Out[241]: 
<6x3 sparse matrix of type '<class 'numpy.uint8'>'
    with 6 stored elements in Compressed Sparse Row format>
In [242]: M.A
Out[242]: 
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 0, 1],
       [1, 0, 0]], dtype=uint8)
In [243]: M.sum(1)            # dense matrix
Out[243]: 
matrix([[1],
        [1],
        [1],
        [1],
        [1],
        [1]], dtype=uint32)
In [244]: M/M.sum(1)      # dense matrix - full size of M
Out[244]: 
matrix([[ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 1.,  0.,  0.]])

这将解释内存错误 - 如果 M 太大以至于 M.A 产生内存错误。

In [262]: S = sparse.csr_matrix(M.sum(1))
In [263]: S.shape
Out[263]: (6, 1)
In [264]: M.shape
Out[264]: (6, 3)
In [265]: M/S
....
ValueError: inconsistent shapes

我不太确定这里发生了什么。

元素明智的乘法有效

In [266]: M.multiply(S)
Out[266]: 
<6x3 sparse matrix of type '<class 'numpy.uint32'>'
    with 6 stored elements in Compressed Sparse Row format>

所以如果我将 S 构造为 S = sparse.csr_matrix(1/M.sum(1))

它应该可以工作

如果某些行的总和为零，则存在被零除的问题。

如果我修改 M 为 0 行

In [283]: M.A
Out[283]: 
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [1, 0, 0]], dtype=uint8)
In [284]: S = sparse.csr_matrix(1/M.sum(1))
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
  #!/usr/bin/python3
In [285]: S.A
Out[285]: 
array([[  1.],
       [  1.],
       [ inf],
       [  1.],
       [  1.],
       [  1.]])
In [286]: M.multiply(S)
Out[286]: 
<6x3 sparse matrix of type '<class 'numpy.float64'>'
    with 5 stored elements in Compressed Sparse Row format>
In [287]: _.A
Out[287]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.]])

这不是最好的 M 来证明这一点，但它提出了一种有用的方法。行和将是密集的，因此您可以使用通常的密集数组方法清理它的逆。

稀疏矩阵的划分

Division of sparse matrix

python

matrix

scipy

sparse-matrix