如何切片 scipy 稀疏矩阵并保持原始索引？

Question

假设我有以下数组：

   import numpy as np
   a = np.array([[1, 2, 3], [0, 1, 2], [1, 3, 4], [4, 5, 6]])
   a = sp_sparse.csr_matrix(a)

我想得到由第一行和最后一行组成的稀疏数组的子矩阵。

>>>sub_matrix = a[[0, 3], :]
>>>print(sub_matrix)
(0, 0)  1
(0, 1)  2
(0, 2)  3
(1, 0)  4
(1, 1)  5
(1, 2)  6

但我想保留所选行的原始索引，因此对于我的示例，它类似于：

  (0, 0)    1
  (0, 1)    2
  (0, 2)    3
  (3, 0)    4
  (3, 1)    5
  (3, 2)    6

我知道我可以通过将密集数组的所有其他行设置为零然后再次计算稀疏数组来做到这一点，但我想知道是否有更好的方法来实现这一点。

如有任何帮助，我们将不胜感激！

Answer 1

import numpy as np
import scipy.sparse as sp_sparse
a = np.array([[1, 2, 3], [0, 1, 2], [1, 3, 4], [4, 5, 6]])
a = sp_sparse.csr_matrix(a)

使用选择矩阵然后相乘可能是最简单的。

idx = np.isin(np.arange(a.shape[0]), [0, 3]).astype(int)
b = sp_sparse.diags(idx, format='csr') @ a

缺点是这会生成一个浮点数组而不是整数数组，但这很容易修复。

>>> b.astype(int).A
array([[1, 2, 3],
       [0, 0, 0],
       [0, 0, 0],
       [4, 5, 6]])

Answer 2

根据索引，使用 coo 输入样式构建 extractor/indexing 矩阵可能更容易：

In [129]: from scipy import sparse
In [130]: M = sparse.csr_matrix(np.arange(16).reshape(4,4))
In [131]: M
Out[131]: 
<4x4 sparse matrix of type '<class 'numpy.int64'>'
    with 15 stored elements in Compressed Sparse Row format>
In [132]: M.A
Out[132]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

具有所需“对角线”值的方形提取器矩阵：

In [133]: extractor = sparse.csr_matrix(([1,1],([0,3],[0,3])))
In [134]: extractor
Out[134]: 
<4x4 sparse matrix of type '<class 'numpy.int64'>'
    with 2 stored elements in Compressed Sparse Row format>

单向矩阵乘法选择列：

In [135]: M@extractor
Out[135]: 
<4x4 sparse matrix of type '<class 'numpy.int64'>'
    with 7 stored elements in Compressed Sparse Row format>
In [136]: _.A
Out[136]: 
array([[ 0,  0,  0,  3],
       [ 4,  0,  0,  7],
       [ 8,  0,  0, 11],
       [12,  0,  0, 15]])

在另一行中，行数：

In [137]: extractor@M
Out[137]: 
<4x4 sparse matrix of type '<class 'numpy.int64'>'
    with 7 stored elements in Compressed Sparse Row format>
In [138]: _.A
Out[138]: 
array([[ 0,  1,  2,  3],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [12, 13, 14, 15]])
In [139]: extractor.A
Out[139]: 
array([[1, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 1]])

M[[0,3],:] 做同样的事情，但是：

In [140]: extractor = sparse.csr_matrix(([1,1],([0,1],[0,3])))
In [142]: (extractor@M).A
Out[142]: 
array([[ 0,  1,  2,  3],
       [12, 13, 14, 15]])

行和列的总和也用矩阵乘法执行：

In [149]: M@np.ones(4,int)
Out[149]: array([ 6, 22, 38, 54])

如何切片 scipy 稀疏矩阵并保持原始索引？

How to slice a scipy sparse matrix and keep the original indexing?

numpy

scipy

sparse-matrix