八度：如何在不扩展 A .* B 的情况下求和 (A .* B, 3)？

Question

考虑以下场景，对于大小为 [k, 1, m] 的 A 和大小为 [1, n, m] 的 B，如何获得与以下相同的结果：

C = sum(A .* B, 3);

不展开

A .* B

因为这会占用太多内存。类似于下面的循环，但是是原生的：

C = zeros(k,n);
for idx = 1:m
    C += A(:,1,idx) * B(1,:,idx);
end

我想我也可以问是否有像 bsxfun 这样的函数具有类似 "reduce" 的行为？类似于：

C = bsxfun_accumulate(@(a, b) a * b, A, B);

注意：我所说的本机是指 cs/cuda 代码路径，或 opencl 代码路径，或 x86-sse，或纯 x86 指令。什么都有。

Answer 1

你实际上可以通过 reshaping the variables A and B and applying a matrix multiply:

来解决你的问题

C = reshape(A, [], m)*(reshape(B, [], m).');

基本上，对涉及 k-by-1 列向量和 1-by-n 行向量的 m 组乘法的结果求和相当于乘以 k-by-m 列矩阵和 m-by-n 行矩阵。

Octave: How to sum(A .* B, 3) without expanding A .* B?