稀疏矩阵和密集矩阵之间的向量矩阵乘积差异

Question

在简单的向量矩阵乘法中，当使用 scipy.sparse 矩阵而不是密集矩阵时，我得到不同的 results/output 格式。作为示例，我使用以下密集矩阵和向量：

import numpy as np
from scipy import sparse
mat = np.array([[1, 1, 0, 0, 0], [0, 2, 2, 0, 0], [0, 0, 3, 3, 0], [0, 0, 0, 4, 4]])
vec = np.arange(1, 5)

对于向量矩阵乘积，我得到以下预期输出：

vec.dot(mat)   # array([ 1,  5, 13, 25, 16])
mat.T.dot(vec) # array([ 1,  5, 13, 25, 16])
mat.T.dot(vec.T) # array([ 1,  5, 13, 25, 16])

我接受向量转置与否都不起作用。但是当我用稀疏矩阵 mat_sparse 替换矩阵 mat 时，我得到一个稀疏 4x5 矩阵数组，其中包含稀疏矩阵乘以每个向量分量，即 [1x mat_sparse, 2x mat_sparse, ...]

mat_sparse = sparse.lil_matrix(mat)
vec.dot(mat_sparse)  # array([ <4x5 sparse matrix of type '<type 'numpy.int64'>' with 8 stored elements in LInked List format>, ...], dtype=object)

使用转置矩阵技巧我得到了预期的结果：

mat_sparse.T.dot(vec4.T)  # array([ 1,  5, 13, 25, 16])

有人可以解释为什么这种行为是 expected/wanted 吗？用 np.matrix(mat 的实例替换矩阵 mat（实际上是一个二维数组）不会改变结果。

Answer 1

sparse 矩阵的运算结果通常也是 sparse 个矩阵。

如果你想转换回密集矩阵，你需要通过对结果使用 .toarray() 方法来询问。

Answer 2

作为一般规则，不要指望 numpy 函数和方法能正确处理稀疏矩阵。最好使用稀疏方法和函数。常规 numpy 代码对稀疏矩阵一无所知。

对于矩阵（稀疏或np.matrix），*是矩阵乘法。

In [2150]: vec*smat    # smat=csr_matrix(mat)
Out[2150]: array([ 1,  5, 13, 25, 16], dtype=int32)

在此上下文中，* 的稀疏矩阵定义优先。

In [2151]: vec.dot(smat)
Out[2151]:...
array([ <4x5 sparse matrix of type '<class 'numpy.int32'>'
    with 8 stored elements in Compressed Sparse Row format>,
    ...
    with 8 stored elements in Compressed Sparse Row format>], dtype=object)

在此表达式中，vec.dot 对稀疏矩阵一无所知。副手看起来它正在对 smat 的每一行分别执行 dot，但我必须进一步挖掘。

以下内容有效，因为它使用了 dot 的稀疏定义，与其 *:

相同

In [2163]: smat.T.dot(vec)
Out[2163]: array([ 1,  5, 13, 25, 16], dtype=int32)

np.dot对稀疏矩阵了解有限。例如，如果两个参数都是稀疏的，它就可以工作。 np.dot(smat, smat.T) 有效（与 np.dot(mat, mat.T) 相同）

In [2177]: np.dot(smat.T,sparse.csr_matrix(vec).T).A
Out[2177]: 
array([[ 1],
       [ 5],
       [13],
       [25],
       [16]], dtype=int32)

了解如何创建稀疏矩阵并存储它们的数据可能会有所帮助。它们不是 np.ndarray.

的子类

稀疏矩阵和密集矩阵之间的向量矩阵乘积差异

Vector Matrix product differences between sparse and dense matrix

python

numpy

scipy

sparse-matrix