在 Python 中计算稀疏 Gram 矩阵的最快方法是什么?
What is the fastest way to compute a sparse Gram matrix in Python?
格拉姆矩阵是 X @ X.T
结构的矩阵,当然是对称的。在处理密集矩阵时,numpy.dot
乘积实现足够智能,可以识别自乘以利用对称性,从而加快计算速度(参见 )。但是,使用 scipy.sparse
矩阵时无法观察到这种效果:
random.seed(0)
X = random.randn(5,50)
X[X < 1.5] = 0
X = scipy.sparse.csr_matrix(X)
print(f'sparsity of X: {100 * (1 - X.count_nonzero() / prod(X.shape)):5.2f} %')
# sparsity of X: 92.00 %
%timeit X @ X.T
# 248 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
X2 = X.copy()
%timeit X @ X2.T
# 251 µs ± 9.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
所以我想知道:在 Python 中计算稀疏 Gram 矩阵的最快方法是什么?值得注意的是,仅计算下(或等效地,上)三角形就足够了。
我读过很多次,多次使用 skyline format is very efficient for symmetrical matrices, however, scipy doesn't support the skyline format. Instead, people were pointing towards pysparse,但是 pysparse 似乎很久以前就停产了,不支持 Python 3. 至少,由于与 Python 3.
的兼容性问题,我的 Anaconda 拒绝安装 pysparse
感谢用户CJR的评论,我制定了一个满意的解决方案。其实我找到了a library on GitHub which wraps the MKL routine mkl_sparse_spmm
for Python. This routine is for fast multiplication of two sparse matrices. So all I had to do was to extend the library and provide a similar wrapper for mkl_sparse_syrk
. And this is exactly what I did.
我还要补充一些意见,之后我会向原项目提交一个pull request。
然而,这是性能结果,相当令人印象深刻:
random.seed(0)
X = random.randn(500, 5000)
X[X < 0.8] = 0
X = scipy.sparse.csr_matrix(X)
print(f'X sparsity: {100 * (1 - X.count_nonzero() / prod(X.shape)):5.2f} %')
# X sparsity: 78.80 %
expected_result = (X @ X.T).toarray()
expected_result_triu = expected_result.copy()
expected_result_triu[tril_indices(expected_result.shape[0], k=-1)] = 0
mkl_result1 = sparse_dot_mkl.dot_product_mkl(X, X.T)
allclose(mkl_result1.toarray(), expected_result)
# True
mkl_result2 = sparse_dot_mkl.dot_product_transpose_mkl(X)
allclose(mkl_result2.toarray(), expected_result_triu)
# True
%timeit X @ X.T
# 197 ms ± 5.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit sparse_dot_mkl.dot_product_mkl(X, X.T)
# 70.6 ms ± 593 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sparse_dot_mkl.dot_product_transpose_mkl(X)
# 34.2 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
使用 MKL 中的通用点积而不是 scipy 中的点积实现会产生 279% 的加速。使用专用产品进行 Gram 矩阵计算可产生 576% 的加速。这是巨大的。
格拉姆矩阵是 X @ X.T
结构的矩阵,当然是对称的。在处理密集矩阵时,numpy.dot
乘积实现足够智能,可以识别自乘以利用对称性,从而加快计算速度(参见 scipy.sparse
矩阵时无法观察到这种效果:
random.seed(0)
X = random.randn(5,50)
X[X < 1.5] = 0
X = scipy.sparse.csr_matrix(X)
print(f'sparsity of X: {100 * (1 - X.count_nonzero() / prod(X.shape)):5.2f} %')
# sparsity of X: 92.00 %
%timeit X @ X.T
# 248 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
X2 = X.copy()
%timeit X @ X2.T
# 251 µs ± 9.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
所以我想知道:在 Python 中计算稀疏 Gram 矩阵的最快方法是什么?值得注意的是,仅计算下(或等效地,上)三角形就足够了。
我读过很多次,多次使用 skyline format is very efficient for symmetrical matrices, however, scipy doesn't support the skyline format. Instead, people were pointing towards pysparse,但是 pysparse 似乎很久以前就停产了,不支持 Python 3. 至少,由于与 Python 3.
的兼容性问题,我的 Anaconda 拒绝安装 pysparse感谢用户CJR的评论,我制定了一个满意的解决方案。其实我找到了a library on GitHub which wraps the MKL routine mkl_sparse_spmm
for Python. This routine is for fast multiplication of two sparse matrices. So all I had to do was to extend the library and provide a similar wrapper for mkl_sparse_syrk
. And this is exactly what I did.
我还要补充一些意见,之后我会向原项目提交一个pull request。
然而,这是性能结果,相当令人印象深刻:
random.seed(0)
X = random.randn(500, 5000)
X[X < 0.8] = 0
X = scipy.sparse.csr_matrix(X)
print(f'X sparsity: {100 * (1 - X.count_nonzero() / prod(X.shape)):5.2f} %')
# X sparsity: 78.80 %
expected_result = (X @ X.T).toarray()
expected_result_triu = expected_result.copy()
expected_result_triu[tril_indices(expected_result.shape[0], k=-1)] = 0
mkl_result1 = sparse_dot_mkl.dot_product_mkl(X, X.T)
allclose(mkl_result1.toarray(), expected_result)
# True
mkl_result2 = sparse_dot_mkl.dot_product_transpose_mkl(X)
allclose(mkl_result2.toarray(), expected_result_triu)
# True
%timeit X @ X.T
# 197 ms ± 5.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit sparse_dot_mkl.dot_product_mkl(X, X.T)
# 70.6 ms ± 593 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sparse_dot_mkl.dot_product_transpose_mkl(X)
# 34.2 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
使用 MKL 中的通用点积而不是 scipy 中的点积实现会产生 279% 的加速。使用专用产品进行 Gram 矩阵计算可产生 576% 的加速。这是巨大的。