如何在 Python 中对角化稀疏 csr 一维矩阵(向量)?

How to diagonalise sparse csr 1D-matrix (vector) in Python?

[简版]

在scipy.sparse中是否有等同于numpy.diagflat()的函数?或者有什么方法可以 'flatten' 稀疏矩阵变得密集?

[长版]

我有一个稀疏矩阵(数学上是一个向量),x_f,我需要对角化(即创建一个方矩阵,其对角线上的 [​​=47=] 向量的值)。

x_f
Out[59]: 
<35021x1 sparse matrix of type '<class 'numpy.float64'>'
    with 47 stored elements in Compressed Sparse Row format>

我试过 'diags' 来自 scipy.sparse 模块。 (我也试过 'spdiags',但它只是 'diags' 的更花哨的版本,我不需要。)我已经尝试过 [csr 或 csc 格式] 的每种组合, [原始或转置向量] 和 [.todense() 或 .toarray()],但我不断收到错误消息:

ValueError: Different number of diagonals and offsets.

使用 sparse.diags 默认偏移量为 0,而我想要做的是 将数字放在主对角线上(这是默认值) , 所以得到这个错误意味着它没有像我想要的那样工作。

以下是分别使用 .todense() 和 .toarray() 的原始向量和转置向量的示例:

x_f_original.todense()
Out[72]: 
matrix([[  0.00000000e+00],
        [  0.00000000e+00],
        [  0.00000000e+00],
        ..., 
        [  0.00000000e+00],
        [  1.03332178e-17],
        [  0.00000000e+00]])

x_f_transposed.toarray()
Out[83]: 
array([[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
          0.00000000e+00,   1.03332178e-17,   0.00000000e+00]])

以下代码有效,但需要大约 15 秒才能完成 运行:

x_f_diag = sparse.csc_matrix(np.diagflat(x_f.todense()))

有没有人知道如何提高效率或更好的方法?

[免责声明]

这是我的第一个问题。我希望我做对了,对于任何不清楚的地方,我深表歉意。

您可以使用以下构造函数:

csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])

where data, row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]] = data[k].

演示(COO矩阵):

from scipy.sparse import random, csr_matrix, coo_matrix

In [142]: M = random(10000, 1, .005, 'coo')

In [143]: M
Out[143]:
<10000x1 sparse matrix of type '<class 'numpy.float64'>'
        with 50 stored elements in COOrdinate format>

In [144]: M2 = coo_matrix((M.data, np.diag_indices(len(M.data))), (len(M.data), len(M.data)))

In [145]: M2
Out[145]:
<50x50 sparse matrix of type '<class 'numpy.float64'>'
        with 50 stored elements in COOrdinate format>

In [146]: M2.todense()
Out[146]:
matrix([[ 0.1559936 ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.28984266,  0.        , ...,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.21381431, ...,  0.        ,  0.        ,  0.        ],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.23100531,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.13789309,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.73827   ]])

演示(CSR矩阵):

In [112]: from scipy.sparse import random, csr_matrix

In [113]: M = random(10000, 1, .005, 'csr')

In [114]: M
Out[114]:
<10000x1 sparse matrix of type '<class 'numpy.float64'>'
        with 50 stored elements in Compressed Sparse Row format>

In [137]: M2 = csr_matrix((M.data, np.diag_indices(len(M.data))), (len(M.data), len(M.data)))

In [138]: M2
Out[138]:
<50x50 sparse matrix of type '<class 'numpy.float64'>'
        with 50 stored elements in Compressed Sparse Row format>

In [139]: M2.todense()
Out[139]:
matrix([[ 0.45661992,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.42428401,  0.        , ...,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.99484544, ...,  0.        ,  0.        ,  0.        ],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.80880579,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.46292628,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.56363196]])

如果需要密集矩阵:

In [147]: np.diagflat(M.data)
Out[147]:
array([[ 0.1559936 ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.28984266,  0.        , ...,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.21381431, ...,  0.        ,  0.        ,  0.        ],
       ...,
       [ 0.        ,  0.        ,  0.        , ...,  0.23100531,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.13789309,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.73827   ]])
In [106]: x_f = sparse.random(1000,1, .1, 'csr')
In [107]: x_f
Out[107]: 
<1000x1 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

如果把它变成一维密集数组,我可以在sparse.diags中使用它。

In [108]: M1=sparse.diags(x_f.A.ravel()).tocsr()
In [109]: M1
Out[109]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

或者我可以将其设为 (1,1000) 矩阵,并使用列表作为偏移量:

In [110]: M2=sparse.diags(x_f.T.A,[0]).tocsr()
In [111]: M2
Out[111]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

diags 采用密集的对角线,而不是稀疏的。这是按原样存储的,所以我使用进一步的 .tocsr 删除 0 等

In [113]: sparse.diags(x_f.T.A,[0])
Out[113]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 1000 stored elements (1 diagonals) in DIAgonal format>

所以无论哪种方式,我都将对角线的形状与偏移量(标量或 1)相匹配。

直接映射到 csr(或 csc)可能更快。

对于这种列形状,indices 属性没有告诉我们任何信息。

In [125]: x_f.indices
Out[125]: 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...0, 0, 0], dtype=int32)

但将其转换为 csc(这将 indptr 映射到 indices

In [126]: x_f.tocsc().indices
Out[126]: 
array([  2,  15,  26,  32,  47,  56,  75,  82,  96,  99, 126, 133, 136,
       141, 145, 149, ... 960, 976], dtype=int32)
In [127]: idx=x_f.tocsc().indices

In [128]: M3 = sparse.csr_matrix((x_f.data, (idx, idx)),(1000,1000))
In [129]: M3
Out[129]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>