基于两个数组的唯一条目求和 |速度问题

Question

我有 3 个大小为 803500*1 的数组，具有以下详细信息：

Rid: 可以包含任意数字
RidID：它包含从1到184的随机顺序的元素。每个元素出现多次。
r：它包含元素 0,1,2,...12。所有元素（零除外）在此数组中的随机索引处出现近 3400 到 3700 次。

以下可能对生成样本数据有用：

Rid = rand(803500,1);
RidID = randi(184,803500,1);
r = randi(13,803500,1)-1;  %This may not be a good sample for r as per previously mentioned details?

我想做什么？ 我想计算 Rid 的那些条目的总和，这些条目对应于 r 的每个正唯一条目和 RidID 的每个唯一条目。我为这个问题写的代码可能会更清楚：

RNum = numel(unique(RidID));
RSum = ones(RNum,12); %Preallocating for better speed
for i=1:12
    RperM = r ==i;
    for j = 1:RNum 
        RSum(j,i)  = sum(Rid(RperM & (RidID==j)));
    end
end

问题： 我的代码有效，但在我的计算机上平均需要 5 秒，我必须进行近千次计算。如果这个时间从 5 秒减少到至少一半，我会很高兴。但是我该如何优化呢？我不介意它是否通过矢量化或任何更好的书面循环变得更好。

我正在使用 MATLAB R2017b。

Answer 1

您可以使用 accumarray :

u  = unique(RidID);
A = accumarray([RidID r+1], Rid);
RSum = A(u, 2:13);

Answer 2

这比 accumarray 慢，因为 by rahnema，但是使用 findgroups 和 splitapply 可以节省内存。

在您的示例中，结果矩阵中可能有数千个零值元素，其中 RidID 和 r 的组合不会出现。在这种情况下，堆叠结果会更节省内存，如下所示：

RidID    | r    | Rid_sum
-------------------------
1        | 1    | 100
2        | 1    | 200
4        | 2    | 85
...

这可以通过以下代码实现：

[ID, rn, RidIDn] = findgroups(r,RidID); % Get unique combo ID for 'r' and 'RidID'
RSum = splitapply( @sum, Rid, ID );     % Sum for each ID
output = table( RidIDn, rn, RSum );     % Nicely formatted table output
% Get rid of elements where r == 0
output( output.rn == 0, : ) = [];

您可以将其转换为与 accumarray 方法相同的输出，但它已经是一种较慢的方法...

% Convert to 'unstacked' 2D matrix (optional)
RSum = full( sparse( 1:numel(Ridn), 1:numel(rn), RSum ) );

基于两个数组的唯一条目求和 |速度问题

Summation based on unique entries of two arrays | Speed Issue

arrays

matlab

loops

sum

vectorization