如何提高嵌套循环的速度/寻找更高效的代码

Question

我有以下代码，现在太慢了，我正在寻找更好的方法来编写它。

此代码是包含 for 循环 t=1:T 的更长代码的一部分，因此以下代码针对每个 t 运行。

我有 F 家生产 Yd 的公司。在每个时期 t 购买一定数量的 K 台机床，其生产率由 A 给出；因此，在 t 购买的机床的生产能力等于 AK=A.*K。企业的总生产能力等于 sum(AK)。由于 sum(AK) 可能大于他们想要生产的 Yd，公司必须确定他们将使用他们拥有的每台机器“v”的哪一部分 (omega)，考虑到他们首先开始使用生产力最高的机床（那些具有最高的 A)。因此，从生产率最高的机器开始，如果某台机器的生产能力低于Yd，则该机器的omega将设置为1，如果生产能力已经达到Yd，则该机器的omega将设置为0 ，否则 omega 将被设置为所需的分数。我该如何编码？

这就是我所做的，但绝对太慢了：

T=100;                     
F=50;

A=rand(T,F);                  %initialized randomly (not as in my real code)
K=rand(T,F);                  %idem 
AK=A.*K;                      %productivity of machine tools
Yd=rand(1,F);                 %initialized randomly 
omega=zeros(T,F);             %needs to be computed                                     
OAK(:,:)=zeros(T,F);          %needs to be computed                                       

   for f=1:F
  [A_sorted_value,A_sorted_index]=sort(A(:,f));                            %sort machines according to their productivity level

        while sum(A_sorted_value)>0 && sum(OAK(1:t,f))<Yd(f)               %apply rule as long as there are machines available and desired production has not been achieved
            v=A_sorted_index(end);                                         %pick the best machine of capital, i.e. highest productivity A
            if AK(v,f)+sum(OAK(1:t,f))<=Yd(f)                              %assign omega=1 if existing capacity is below desired production
                omega(v,f)=1;
                OAK(v,f)=omega(v,f).*AK(v,f);                              %update existing capacity

            elseif AK(v,f)+sum(OAK(1:t,f))>Yd(f) && Yd(f)>sum(OAK(1:t,f))  %assign omega so that machine v can fill desired production
                omega(v,f)=(Yd(f)-sum(OAK(1:t,f)))/AK(v,f);
                OAK(v,f)=omega(v,f).*AK(v,f);

            end
            A_sorted_index(end)=[];                                        %remove machine v from the list and pick the next one if condition "while" is satisfied
            A_sorted_value(end)=[];                                        %otherwise go to the next firm
        end
    end

希望清楚，提前致谢！

Answer 1

看来你是运行某种模拟。这意味着 （如果结果取决于之前的迭代），您的 t 和 f 循环无法优化掉。因此，唯一可用的优化目标是内部 while 循环。我建议您考虑是否可以预先计算或矢量化任何东西（检查纸上的数学）。

但由于我无法真正帮助您，这里有一些可能会有所帮助的事情。

我可以立即提供一个容易实现的成果，您可以使用 sort 函数按某个设定方向对整个矩阵进行排序，这样您就可以将其移到 f 循环之外。

% Sort the entire A array, column-wise.
[AVals, AIdxs] = sort(A,1);
for f = 1:F
    A_sorted_value = AVal(:,f);
    A_sorted_index = AIdx(:,f);
    ...
end

接下来，我建议您使用 for 循环而不是 while 循环。

因为您已经知道您将遍历整个排序的 A 列表，或者直到满足另一个条件 (sum(OAK(1:t,f))<Yd(f))，这可以通过 for环形。结合上面的推荐，你可能会得到这样的结果：

% Sort A in descending order instead, so first element in the array is now highest in value
[AVal, AIdx] = sort(A,1,'descend');
for f = 1:F
    A_sorted_value = AVal(:,f); % Appears to be unused?
    A_sorted_index = AIdx(:,f);

    % Iterate through entire A_sorted_index array, eliminates rewriting of A_sorted_index
    for i = 1:length(A_sorted_index)
        v = A_sorted_index(i);
        
        % Loop Exit Conditions
        if (sum(OAK(1:t,f))<Yd(f)) || (sum(A_sorted_value(1:i)) <= 0)
            break
        end

        % Your code below
        %assign omega=1 if existing capacity is below desired production
        if AK(v,f)+sum(OAK(1:t,f)) <= Yd(f)
            omega(v,f)=1;
            %update existing capacity
            OAK(v,f)=omega(v,f).*AK(v,f);

        %assign omega so that machine v can fill desired production
        elseif AK(v,f)+sum(OAK(1:t,f))>Yd(f) && Yd(f)>sum(OAK(1:t,f))
            omega(v,f) = (Yd(f)-sum(OAK(1:t,f)))/AK(v,f);
            OAK(v,f) = omega(v,f).*AK(v,f);
        end
    end
end

现在，A_sorted_value 似乎未被使用。我不能假设 A 中元素的值是非零的正数。所以我不得不包括检查条件sum(A_sorted_value(1:i)) <= 0。但是，如果您确实只有非零正 A 元素，则可以删除该片段以进一步加快循环速度。

我根据旧代码对这段代码进行了概要分析，在 T=200、F=100、运行循环 50 次。

new = 6.835 secs
old = 65.634 secs

大约减少了 90% 的时间。

减少的大量时间来自删除这些非常浪费的代码行。

A_sorted_index(end)=[];
A_sorted_value(end)=[];

我确定还有很多优化空间，但这对我来说已经足够了。

请验证我提供的代码是否正确。

Answer 2

这是一个看起来更快的版本，它存储了一些多次使用的结果。除了 momocha 的建议之外，您应该还可以实施这些更改。

为了大幅提高速度，您应该转置所有矩阵，以便 f 按列而不是行访问数组，或者交换 F 和 [=13= 的顺序] for 循环。一次访问矩阵的单个列比处理单个行更有效，因此始终先遍历列，然后遍历该列中的行。

% precompute sorted array
[A_sorted,A_sorted_I]=sort(A);

for f = 1:F
    % get sorted vectors from percomputed array
    A_sorted_value = A_sorted(:,f);
    A_sorted_index  = A_sorted_I(:,f);

    % use nnz() instead of sum() (this may not make a difference)
    while nnz(A_sorted_value)>0 && sum(OAK(1:t,f))<Yd(f)
        % precompute a few things that are used several times
        v = A_sorted_index(end);
        ak_ = AK(v,f);
        sOAK = sum(OAK(1:t,f));
        if ak_+sOAK<=Yd(f)
            omega(v,f)=1;
            OAK(v,f)=ak_; % don't multiply by omega(v,f), we know it's one, this saves memory lookup
        else % doesn't need to be an else if, condition will always be true
            ex = Yd(f)-sOAK;
            omega(v,f)=ex/ak_;
            OAK(v,f)=ex;
        end

        % this is a faster way to de-allocate
        A_sorted_index=A_sorted_index(1:end-1);
        A_sorted_value=A_sorted_value(1:end-1);
    end
end

如何提高嵌套循环的速度/寻找更高效的代码

How to improve speed of nested loop/ Looking for more efficient codes

performance

matlab

loops

nested-loops