np.linalg.norm 不适用于 CSR 矩阵

np.linalg.norm does not work for CSR matrix

我有一个 220,000 x 34 矩阵表示为 Numpy CSR 矩阵。当我尝试采用矩阵的逐行范数时,出现异常:

>>> np.linalg.norm(csr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\IBM_ADMIN\AppData\Local\Programs\Python\Python37\lib\site-packa
ges\numpy\linalg\linalg.py", line 2450, in norm
    sqnorm = dot(x, x)
  File "C:\Users\IBM_ADMIN\AppData\Local\Programs\Python\Python37\lib\site-packa
ges\scipy\sparse\base.py", line 480, in __mul__
    raise ValueError('dimension mismatch')
ValueError: dimension mismatch
>>> csr
<3x2 sparse matrix of type '<class 'numpy.int32'>'
        with 6 stored elements in Compressed Sparse Row format>

Numpy methods/functions 使用 CSR 矩阵是否有限制?

无奈之下,我尝试通过对矩阵自身进行逐元素乘法然后沿行求和来解决这个问题,但我也遇到了一个例外。

你需要:

np.linalg.norm(csr.toarray())

示例:

import numpy as np
from scipy.sparse import csr_matrix
csr = csr_matrix((3, 4), dtype=np.int8).toarray()
np.linalg.norm(csr)
0.0

numpy 函数不适用于 sparse 矩阵是规则,而不是例外。

这是直接在 csr 表示上操作的解决方法:

from scipy.sparse import random

A = random(1000,500,format="csr")

def sparse_row_norm(A):
    out = np.zeros(A.shape[0])
    # ufunc.reduceat only works properly for strictly increasing points
    # as a workaround we filter out empty rows
    nz, = np.diff(A.indptr).nonzero()
    out[nz] = np.sqrt(np.add.reduceat(np.square(A.data),A.indptr[nz]))
    return out

# compare to brute force (convert to dense array) method
np.allclose(sparse_row_norm(A),np.linalg.norm(A.A,axis=1))
# True

# results are the same but speed is much better
timeit(lambda:sparse_row_norm(A),number=1000)
# 0.04653145093470812
timeit(lambda:np.linalg.norm(A.A,axis=1),number=1000)
# 1.6365239119622856

这里有一个函数 snorm 对两者都有效; snorm( sparsematrix, axis=1 ) 获取行规范 --

import numpy as np    
from scipy import sparse
from scipy.sparse import linalg as sslin

def snorm( A, ord=None, axis=None ):
    """ norm of either scipy sparse matrix or numpy dense ndarray
        ord=None: sqrt( sum Aij^2 ), Frobenius norm
        ord=np.inf: max |Aij|
        axis=None: all,  axis=0: columns,  axis=1: rows
    """
    if sparse.issparse( A ):
        return sslin.norm( A, ord=ord, axis=axis )
        # https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.norm.html
    else:
        return np.linalg.norm( A, ord=ord, axis=axis )


#...............................................................................
# test snorm --
for A in [np.array([[ 1, 2 ], [3, 4] ]),
        np.eye( 0 )]:
    print( "\nA: \n", A )
    S = sparse.csr_matrix( A )
    for axis in [None, 0, 1]:
        npnorm = snorm( A, axis=axis )
        sparsenorm = snorm( S, axis=axis )
        print( "snorm axis %s: %s  %s " % (axis, npnorm, sparsenorm ))