Scipy 永远占用稀疏矩阵循环 - 需要提高效率
Scipy Sparse Matrix Loop Taking Forever - Need to make more efficient
用稀疏矩阵(当前使用 csc_matrix)编写此循环的最有效的时间和内存方式是什么
for j in range(0, reducedsize):
xs = sum(X[:, j])
X[:, j] = X[:, j] / xs.data[0]
示例:
缩小尺寸(整数)- 2500
X (csc_matrix) - 908x2500
循环确实会迭代,但与仅使用 numpy 相比需要很长时间。
In [388]: from scipy import sparse
制作样本矩阵:
In [390]: M = sparse.random(10,8,.2, 'csc')
矩阵和:
In [393]: M.sum(axis=0)
Out[393]:
matrix([[1.95018736, 0.90924629, 1.93427113, 2.38816133, 1.08713479,
0. , 2.45435481, 0. ]])
结果中除 - 和 nan
时那些 0 会产生警告:
In [394]: M/_
/usr/local/lib/python3.6/dist-packages/scipy/sparse/base.py:599: RuntimeWarning: invalid value encountered in true_divide
return np.true_divide(self.todense(), other)
Out[394]:
matrix([[0. , 0. , 0. , 0. , 0.27079623,
nan, 0.13752665, nan],
[0. , 0. , 0. , 0. , 0. ,
nan, 0.32825122, nan],
[0. , 0. , 0. , 0. , 0. ,
nan, 0. , nan],
...
nan, 0. , nan]])
0 也给你的方法带来了问题:
In [395]: for i in range(8):
...: xs = sum(M[:,i])
...: M[:,i] = M[:,i]/xs.data[0]
...:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-395-0195298ead19> in <module>
1 for i in range(8):
2 xs = sum(M[:,i])
----> 3 M[:,i] = M[:,i]/xs.data[0]
4
IndexError: index 0 is out of bounds for axis 0 with size 0
但是如果我们比较没有 0 的列和值匹配:
In [401]: Out[394][:,:5]
Out[401]:
matrix([[0. , 0. , 0. , 0. , 0.27079623],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0.49648886, 0.25626608, 0. , 0.19162678, 0.72920377],
[0. , 0. , 0.30200765, 0. , 0. ],
[0.50351114, 0. , 0.30445113, 0.41129367, 0. ],
[0. , 0.74373392, 0. , 0. , 0. ],
[0. , 0. , 0.39354122, 0. , 0. ],
[0. , 0. , 0. , 0.39707955, 0. ]])
In [402]: M.A[:,:5]
Out[402]:
array([[0. , 0. , 0. , 0. , 0.27079623],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0.49648886, 0.25626608, 0. , 0.19162678, 0.72920377],
[0. , 0. , 0.30200765, 0. , 0. ],
[0.50351114, 0. , 0.30445113, 0.41129367, 0. ],
[0. , 0.74373392, 0. , 0. , 0. ],
[0. , 0. , 0.39354122, 0. , 0. ],
[0. , 0. , 0. , 0.39707955, 0. ]])
回到 [394] 我应该先将矩阵和转换为稀疏的,所以结果也将是稀疏的。稀疏没有逐元素除法,所以我必须先求密集矩阵的逆。 0 仍然很麻烦。
In [409]: M.multiply(sparse.csr_matrix(1/Out[393]))
...
Out[409]:
<10x8 sparse matrix of type '<class 'numpy.float64'>'
with 16 stored elements in Compressed Sparse Column format>
如果你想在没有任何内存开销的情况下进行(就地)
始终考虑数据的实际存储方式。
csc matrix.
上的一个小例子
shape=(5,5)
X=sparse.random(shape[0], shape[1], density=0.5, format='csc')
print(X.todense())
[[0.12146814 0. 0. 0.04075121 0.28749552]
[0. 0.92208639 0. 0.44279661 0. ]
[0.63509196 0.42334964 0. 0. 0.99160443]
[0. 0. 0.25941113 0.44669367 0.00389409]
[0. 0. 0. 0. 0.83226886]]
i=0 #first column
print(X.data[X.indptr[i]:X.indptr[i+1]])
[0.12146814 0.63509196]
Numpy 解决方案
所以我们在这里唯一要做的就是逐列修改非零条目。这可以使用部分矢量化的 numpy 解决方案轻松完成。 data
只是包含所有非零值的数组,indptr
存储每列开始和结束的信息。
def Numpy_csc_norm(data,indptr):
for i in range(indptr.shape[0]-1):
xs = np.sum(data[indptr[i]:indptr[i+1]])
#Modify the view in place
data[indptr[i]:indptr[i+1]]/=xs
就性能而言,这个就地解决方案已经不错了。如果您想进一步提高性能,您可以使用 Cython/Numba/ 或其他一些可以或多或少轻松地包含在 Python 中的编译代码。
Numba 解决方案
@nb.njit(fastmath=True,error_model="numpy",parallel=True)
def Numba_csc_norm(data,indptr):
for i in nb.prange(indptr.shape[0]-1):
acc=0
for j in range(indptr[i],indptr[i+1]):
acc+=data[j]
for j in range(indptr[i],indptr[i+1]):
data[j]/=acc
性能
#Create a not to small example matrix
shape=(50_000,10_000)
X=sparse.random(shape[0], shape[1], density=0.001, format='csc')
#Not in-place from hpaulj
def hpaulj(X):
acc=X.sum(axis=0)
return X.multiply(sparse.csr_matrix(1./acc))
%timeit X2=hpaulj(X)
#6.54 ms ± 67.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#All 2 variants are in-place,
#but this shouldn't have a influence on the timings
%timeit Numpy_csc_norm(X.data,X.indptr)
#79.2 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#parallel=False -> faster on tiny matrices
%timeit Numba_csc_norm(X.data,X.indptr)
#626 µs ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
#parallel=True -> faster on larger matrices
%timeit Numba_csc_norm(X.data,X.indptr)
#185 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
用稀疏矩阵(当前使用 csc_matrix)编写此循环的最有效的时间和内存方式是什么
for j in range(0, reducedsize):
xs = sum(X[:, j])
X[:, j] = X[:, j] / xs.data[0]
示例:
缩小尺寸(整数)- 2500
X (csc_matrix) - 908x2500
循环确实会迭代,但与仅使用 numpy 相比需要很长时间。
In [388]: from scipy import sparse
制作样本矩阵:
In [390]: M = sparse.random(10,8,.2, 'csc')
矩阵和:
In [393]: M.sum(axis=0)
Out[393]:
matrix([[1.95018736, 0.90924629, 1.93427113, 2.38816133, 1.08713479,
0. , 2.45435481, 0. ]])
结果中除 - 和 nan
时那些 0 会产生警告:
In [394]: M/_
/usr/local/lib/python3.6/dist-packages/scipy/sparse/base.py:599: RuntimeWarning: invalid value encountered in true_divide
return np.true_divide(self.todense(), other)
Out[394]:
matrix([[0. , 0. , 0. , 0. , 0.27079623,
nan, 0.13752665, nan],
[0. , 0. , 0. , 0. , 0. ,
nan, 0.32825122, nan],
[0. , 0. , 0. , 0. , 0. ,
nan, 0. , nan],
...
nan, 0. , nan]])
0 也给你的方法带来了问题:
In [395]: for i in range(8):
...: xs = sum(M[:,i])
...: M[:,i] = M[:,i]/xs.data[0]
...:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-395-0195298ead19> in <module>
1 for i in range(8):
2 xs = sum(M[:,i])
----> 3 M[:,i] = M[:,i]/xs.data[0]
4
IndexError: index 0 is out of bounds for axis 0 with size 0
但是如果我们比较没有 0 的列和值匹配:
In [401]: Out[394][:,:5]
Out[401]:
matrix([[0. , 0. , 0. , 0. , 0.27079623],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0.49648886, 0.25626608, 0. , 0.19162678, 0.72920377],
[0. , 0. , 0.30200765, 0. , 0. ],
[0.50351114, 0. , 0.30445113, 0.41129367, 0. ],
[0. , 0.74373392, 0. , 0. , 0. ],
[0. , 0. , 0.39354122, 0. , 0. ],
[0. , 0. , 0. , 0.39707955, 0. ]])
In [402]: M.A[:,:5]
Out[402]:
array([[0. , 0. , 0. , 0. , 0.27079623],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0.49648886, 0.25626608, 0. , 0.19162678, 0.72920377],
[0. , 0. , 0.30200765, 0. , 0. ],
[0.50351114, 0. , 0.30445113, 0.41129367, 0. ],
[0. , 0.74373392, 0. , 0. , 0. ],
[0. , 0. , 0.39354122, 0. , 0. ],
[0. , 0. , 0. , 0.39707955, 0. ]])
回到 [394] 我应该先将矩阵和转换为稀疏的,所以结果也将是稀疏的。稀疏没有逐元素除法,所以我必须先求密集矩阵的逆。 0 仍然很麻烦。
In [409]: M.multiply(sparse.csr_matrix(1/Out[393]))
...
Out[409]:
<10x8 sparse matrix of type '<class 'numpy.float64'>'
with 16 stored elements in Compressed Sparse Column format>
如果你想在没有任何内存开销的情况下进行(就地)
始终考虑数据的实际存储方式。 csc matrix.
上的一个小例子shape=(5,5)
X=sparse.random(shape[0], shape[1], density=0.5, format='csc')
print(X.todense())
[[0.12146814 0. 0. 0.04075121 0.28749552]
[0. 0.92208639 0. 0.44279661 0. ]
[0.63509196 0.42334964 0. 0. 0.99160443]
[0. 0. 0.25941113 0.44669367 0.00389409]
[0. 0. 0. 0. 0.83226886]]
i=0 #first column
print(X.data[X.indptr[i]:X.indptr[i+1]])
[0.12146814 0.63509196]
Numpy 解决方案
所以我们在这里唯一要做的就是逐列修改非零条目。这可以使用部分矢量化的 numpy 解决方案轻松完成。 data
只是包含所有非零值的数组,indptr
存储每列开始和结束的信息。
def Numpy_csc_norm(data,indptr):
for i in range(indptr.shape[0]-1):
xs = np.sum(data[indptr[i]:indptr[i+1]])
#Modify the view in place
data[indptr[i]:indptr[i+1]]/=xs
就性能而言,这个就地解决方案已经不错了。如果您想进一步提高性能,您可以使用 Cython/Numba/ 或其他一些可以或多或少轻松地包含在 Python 中的编译代码。
Numba 解决方案
@nb.njit(fastmath=True,error_model="numpy",parallel=True)
def Numba_csc_norm(data,indptr):
for i in nb.prange(indptr.shape[0]-1):
acc=0
for j in range(indptr[i],indptr[i+1]):
acc+=data[j]
for j in range(indptr[i],indptr[i+1]):
data[j]/=acc
性能
#Create a not to small example matrix
shape=(50_000,10_000)
X=sparse.random(shape[0], shape[1], density=0.001, format='csc')
#Not in-place from hpaulj
def hpaulj(X):
acc=X.sum(axis=0)
return X.multiply(sparse.csr_matrix(1./acc))
%timeit X2=hpaulj(X)
#6.54 ms ± 67.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#All 2 variants are in-place,
#but this shouldn't have a influence on the timings
%timeit Numpy_csc_norm(X.data,X.indptr)
#79.2 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#parallel=False -> faster on tiny matrices
%timeit Numba_csc_norm(X.data,X.indptr)
#626 µs ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
#parallel=True -> faster on larger matrices
%timeit Numba_csc_norm(X.data,X.indptr)
#185 µs ± 5.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)