算术表达式中的 ufunc 内存消耗

ufunc memory consumption in arithemtic expressions

算术 numpy 表达式的内存消耗是多少

vec ** 3 + vec ** 2 + vec

(vec 是 numpy.ndarray)。是否为每个中间操作存储一个数组?这样的复合表达式是否可以比底层 ndarray 具有多倍的内存?

你是对的,每个中间结果都会分配一个新的数组。幸运的是,numexpr 包就是为解决这个问题而设计的。来自描述:

The main reason why NumExpr achieves better performance than NumPy is that it avoids allocating memory for intermediate results. This results in better cache utilization and reduces memory access in general. Due to this, NumExpr works best with large arrays.

示例:

In [97]: xs = np.random.rand(1_000_000)

In [98]: %timeit xs ** 3 + xs ** 2 + xs
26.8 ms ± 371 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [99]: %timeit numexpr.evaluate('xs ** 3 + xs ** 2 + xs')
1.43 ms ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

感谢@max9111 指出 numexpr 简化了乘法运算。似乎基准测试中的大部分差异都可以通过 xs ** 3.

的优化来解释
In [421]: %timeit xs * xs
1.62 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [422]: %timeit xs ** 2
1.63 ms ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [423]: %timeit xs ** 3
22.8 ms ± 283 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [424]: %timeit xs * xs * xs
2.52 ms ± 58.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)