将 scipy 稀疏矩阵转换为基于索引的 numpy 数组
Transform scipy sparse matrix to index-based numpy array
我有一个 scipy 稀疏矩阵,其 N 值非零,我想将其作为形状为 (N,3) 的 numpy 数组返回,其中第一列包含的索引非零值,最后一列包含相应的非零值。
示例:
我愿意
mymatrix.toarray()
matrix([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.83885831, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 1.13395003, 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0.57979727, 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.75500017, 0. , 0.81459546, 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.87997548, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
成为
np.array([[3, 2, 0.83885831], [4,5,1.13395003], [6,5,0.57979727], [7,4,0.75500017], [7,6,0.81459546], [8,9,0.87997548]])
array([[3. , 2. , 0.83885831],
[4. , 5. , 1.13395003],
[6. , 5. , 0.57979727],
[7. , 4. , 0.75500017],
[7. , 6. , 0.81459546],
[8. , 9. , 0.87997548]])
我如何有效地做到这一点?
转换后,我将遍历行 - 因此,如果有一个有效的选项可以在不进行转换的情况下遍历行,我也将不胜感激:
for index_i, index_j, value in mymatrix.iterator():
do_something(index_i, index_j, value)
对于迭代,dok(键的字典)格式看起来很自然;你可以这样做:
for (i,j), v in your_sparse_matrix.todok().items():
etc.
坐标值记录的Nx3列表可以很容易地从coo格式中得到:
coo = your_sparse_matrix.tocoo()
np.column_stack((coo.row,coo.col,coo.data))
显然,这也可以用于迭代;您必须测试在您的用例中哪个更快。
我有一个 scipy 稀疏矩阵,其 N 值非零,我想将其作为形状为 (N,3) 的 numpy 数组返回,其中第一列包含的索引非零值,最后一列包含相应的非零值。
示例:
我愿意
mymatrix.toarray()
matrix([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.83885831, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 1.13395003, 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0.57979727, 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.75500017, 0. , 0.81459546, 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.87997548, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
成为
np.array([[3, 2, 0.83885831], [4,5,1.13395003], [6,5,0.57979727], [7,4,0.75500017], [7,6,0.81459546], [8,9,0.87997548]])
array([[3. , 2. , 0.83885831],
[4. , 5. , 1.13395003],
[6. , 5. , 0.57979727],
[7. , 4. , 0.75500017],
[7. , 6. , 0.81459546],
[8. , 9. , 0.87997548]])
我如何有效地做到这一点?
转换后,我将遍历行 - 因此,如果有一个有效的选项可以在不进行转换的情况下遍历行,我也将不胜感激:
for index_i, index_j, value in mymatrix.iterator():
do_something(index_i, index_j, value)
对于迭代,dok(键的字典)格式看起来很自然;你可以这样做:
for (i,j), v in your_sparse_matrix.todok().items():
etc.
坐标值记录的Nx3列表可以很容易地从coo格式中得到:
coo = your_sparse_matrix.tocoo()
np.column_stack((coo.row,coo.col,coo.data))
显然,这也可以用于迭代;您必须测试在您的用例中哪个更快。