将 maxtrix 从 scipy.sparse.identity 分配给 csr_matrix

Question

我想将大比例 scipy.sparse.identity 分配给 scipy.sparse.csr_matrix 的一部分，但我没有这样做。在这种情况下，m = 25000000 和 p=3。 Tc_temp 是大小 25000000 x 75000000 的 csr_matrix。

Tc_temp = csr_matrix((m, p * m))
Tc_temp[0: m, np.arange(j, p * m + j, p)] = identity(m, format='csr')

我得到的错误回溯是：

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\_index.py", line 116, in __setitem__
    self._set_arrayXarray_sparse(i, j, x)
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 816, in _set_arrayXarray_sparse
    self._zero_many(*self._swap((row, col)))
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 932, in _zero_many
    i, j, M, N = self._prepare_indices(i, j)
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 882, in _prepare_indices
    i = np.array(i, dtype=self.indices.dtype, copy=False, ndmin=1).ravel()
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 233. GiB for an array with shape (62500000000,) and data type int32

sparse.identity 以某种方式转换为密集矩阵。

Answer 1

对稀疏矩阵的赋值效率不高。它建立一个 row/column 插入大小的索引。显然在这种规模下是不可行的。

您可以通过直接摆弄坐标矩阵中的数据来解决这个问题，但效率不高。

from scipy.sparse import csr_matrix, identity
import numpy as np

m = 25000000
p = 3
j = 0

Tc_temp = csr_matrix((m, p * m)).tocoo()
Tc_identity = identity(m, format='coo')

# If you know Tc_temp is already 0s where you want to do assignments, you can omit this
# It's gonna be slow if there's a lot of data in Tc_temp
Tc_zero_idx = np.isin(Tc_temp.row, Tc_identity.row) & np.isin(Tc_temp.col, Tc_identity.col * p)
Tc_temp.data[Tc_zero_idx] = 0

# Add the identity matrix to your data
Tc_temp.row = np.append(Tc_temp.row, Tc_identity.row)
Tc_temp.col = np.append(Tc_temp.col, Tc_identity.col * p)
Tc_temp.data = np.append(Tc_temp.data, Tc_identity.data)

Tc_temp.tocsr()

通常我会告诉你按块构建它，但如果你试图交错行和列，那对你来说不是一个好选择。

Answer 2

让我们检查一下较小矩阵的操作：

身份 - coo 格式：

In [67]: I = sparse.identity(10,format='coo')
In [68]: I.row
Out[68]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [69]: I.col
Out[69]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

“空白”CSR：

In [70]: M = sparse.csr_matrix((10,30))
In [71]: M.indptr
Out[71]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
In [72]: M.indices
Out[72]: array([], dtype=int32)

作业。我在这里使用切片符号而不是你的 arange，但效果是相同的（即使在时间上）：

In [73]: M[0:10, 0:30:3] = I
/usr/local/lib/python3.8/dist-packages/scipy/sparse/_index.py:116: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  self._set_arrayXarray_sparse(i, j, x)

结果矩阵：

In [74]: M.indptr
Out[74]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10], dtype=int32)
In [75]: M.indices
Out[75]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27], dtype=int32)

并查看相应的 coo 属性：

In [76]: M.tocoo().row
Out[76]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [77]: M.tocoo().col
Out[77]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27], dtype=int32)

row 与 I 相同，而 col 只是您的 arange 索引：

In [78]: np.arange(0,30,3)
Out[78]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

因此您可以创建相同的矩阵：

M1 = sparse.csr_matrix((np.ones(10),(np.arange(10), np.arange(0,30,3))),(10,30))

将 maxtrix 从 scipy.sparse.identity 分配给 csr_matrix

Assigning maxtrix from scipy.sparse.identity to csr_matrix

python

scipy

sparse-matrix