Python: 如何摆脱嵌套循环？

Question

我有 2 个 for 循环，一个接一个，我想以某种方式摆脱它们以提高代码速度。我来自 pandas 的数据框如下所示（headers 代表不同的公司，行代表不同的用户，1 表示用户访问了该公司，否则为 0）：

   100  200  300  400
0    1    1    0    1
1    1    1    1    0

我想比较我数据集中的每一对公司，为此，我创建了一个包含所有公司 ID 的列表。代码查看列表采用第一家公司（基础），然后与其他所有公司（对等）配对，因此第二个 "for" 循环。我的代码如下：

def calculate_scores():
    df_matrix = create_the_matrix(df)
    print(df_matrix)
    for base in list_of_companies:
        counter = 0
        for peer in list_of_companies:
            counter += 1
            if base == peer:
                "do nothing"
            else:
                # Calculate first the denominator since we slice the big matrix
            # In dataframes that only have accessed the base firm
            denominator_df = df_matrix.loc[(df_matrix[base] == 1)]
            denominator = denominator_df.sum(axis=1).values.tolist()
            denominator = sum(denominator) - len(denominator)

            # Calculate the numerator. This is done later because
            # We slice up more the dataframe above by
            # Filtering records which have been accessed by both the base and the peer firm
            numerator_df = denominator_df.loc[(denominator_df[base] == 1) & (denominator_df[peer] == 1)]
            numerator = len(numerator_df.index)
            annual_search_fraction = numerator/denominator
            print("Base: {} and Peer: {} ==> {}".format(base, peer, annual_search_fraction))

编辑 1（添加代码解释）：

指标如下：

1) 我正在尝试计算的指标将告诉我与所有其他搜索相比，2 家公司一起被搜索的次数。

2) 代码首先 selecting 所有访问过基础公司 (denominator_df = df_matrix.loc[(df_matrix[base] == 1)]) 的用户。然后它计算分母，计算基础公司和用户搜索的任何其他公司之间的独特组合的数量，因为我可以计算（用户）访问的公司数量，我可以减去 1 以获得数量基础公司与其他公司之间的独特联系。

3) 接下来，代码仅过滤之前的denominator_df 到select 访问基地和同行公司的行。由于我需要计算访问基地和同行公司的用户数量，我使用命令：numerator = len(numerator_df.index) 来计算行数，这将给我分子。

顶部数据框的预期输出如下：

Base: 100 and Peer: 200 ==> 0.5
Base: 100 and Peer: 300 ==> 0.25
Base: 100 and Peer: 400 ==> 0.25
Base: 200 and Peer: 100 ==> 0.5
Base: 200 and Peer: 300 ==> 0.25
Base: 200 and Peer: 400 ==> 0.25
Base: 300 and Peer: 100 ==> 0.5
Base: 300 and Peer: 200 ==> 0.5
Base: 300 and Peer: 400 ==> 0.0
Base: 400 and Peer: 100 ==> 0.5
Base: 400 and Peer: 200 ==> 0.5
Base: 400 and Peer: 300 ==> 0.0

4) 检查代码是否给出了正确的解决方案：1 个基础公司和所有其他同行公司之间的所有指标总和必须为 1。他们在我发布的代码中这样做了

如有任何关于前进方向的建议或提示，我们将不胜感激！

Answer 1

您可能正在寻找 itertools.product()。这是一个类似于您似乎想要做的示例：

import itertools

a = [ 'one', 'two', 'three' ]

for b in itertools.product( a, a ):
    print( b )

以上代码片段的输出是：

('one', 'one')
('one', 'two')
('one', 'three')
('two', 'one')
('two', 'two')
('two', 'three')
('three', 'one')
('three', 'two')
('three', 'three')

或者您可以这样做：

for u,v in itertools.product( a, a ):
    print( "%s %s"%(u, v) )

然后输出是，

one one
one two
one three
two one
two two
two three
three one
three two
three three

如果你想要一个列表，你可以这样做：

alist = list( itertools.product( a, a ) ) )

print( alist )

输出是，

[('one', 'one'), ('one', 'two'), ('one', 'three'), ('two', 'one'), ('two', 'two'), ('two', 'three'), ('three', 'one'), ('three', 'two'), ('three', 'three')]

Python: 如何摆脱嵌套循环？

Python: how to get rid of nested loops?

python

nested-loops

python-3.x

pandas