将两个 2D numpy 数组相乘的最快方法是什么?

What is the fastest way to multiply two 2D numpy arrays?

我们应该找到一种方法将大小为 (7403, 33) 的二维数组 X 与其转置相乘

我是说这个 X* X.T

该解决方案应该比 np.dot(X,X.T) 快 2.5 倍。 我已经尝试了我能想到的一切

%timeit np.dot(X,X.T)
%timeit np.matmul(X,X.T)
%timeit X@X.T
%timeit np.einsum("ij, jk -> ik",X,X.T)

我只比 numpy 点快 1.5 倍

3.17 s ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.03 s ± 6.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.01 s ± 6.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.02 s ± 6.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

好吧,我找到了 scipy

的解决方案
%timeit np.dot(X,X.T)
%timeit np.matmul(X,X.T)
%timeit X@X.T
%timeit np.einsum("ij, jk -> ik",X,X.T)
%timeit linalg.blas.dgemm(alpha=1.0, a=X, b=X.T)

这给出了

3.07 s ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.02 s ± 37.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.99 s ± 9.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2 s ± 5.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
306 ms ± 6.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)