优化 Python 中的函数以处理大量数据

Question

我尝试在 HackerRank 上解决一个名为 'Apple and orange' 的问题，代码如下：

def countApplesAndOranges(s, t, a, b, apples, oranges):
    count_apples = 0
    count_oranges = 0
    x = [x for x in range(s, t+1)]
    pos_apple = [apple + a for apple in apples]
    pos_orange = [orange + b for orange in oranges]
    for i in x:
        for j in pos_apple:
            if j == i:
                count_apples +=1
        for l in pos_orange:
            if l == i:
                count_oranges += 1
    print(count_apples)
    print(count_oranges)

代码有效。但是，当我尝试提交它时，它通过了前 3 次测试并没有通过其余的测试，但 'Terminated due to timeout' 除外。我检查了其中一项测试的输入，需要处理大量数据，您可以在此处查看数据： https://hr-testcases-us-east-1.s3.amazonaws.com/25220/input03.txt?AWSAccessKeyId=AKIAR6O7GJNX5DNFO3PV&Expires=1642016820&Signature=J4ypdP0YzRxcOWp%2By5XaD5ITeMw%3D&response-content-type=text%2Fplain

它失败了，因为通过我的 IDE 处理具有相同输入的代码大约需要 2 分钟，但 HackerRank 测试限制为 10 秒。

我需要你的帮助来优化代码并使其更快运行。

嵌套循环似乎是这里最大的问题，但我不知道应该用什么替换。

Answer 1

根据范围内的每个坐标检查每个 apple/orange 将代码的运行时间复杂度变为 O(n * a + n * o)，其中 n 是房子的长度，a 是苹果的数量， o 是橙子的数量。理想情况下，您的代码必须运行 in O(a + o).

这是您的解决方案的重构版本：

def countApplesAndOranges(s, t, a, b, apples, oranges):
    count_apples = 0
    count_oranges = 0
    for apple in pos_apple:
        if s <= apple + a <= t:
            count_apples +=1
    for orange in pos_orange:
        if s <= orange + b <= t:
            count_oranges += 1
    print(count_apples)
    print(count_oranges)

Answer 2

如果可以，不要建立中间列表。如果确实需要（这个问题不是这种情况），请改用生成器。

尽量不要重复自己，尽可能用函数分解。

def countApplesAndOranges(home_start, home_end, tree_apple, tree_orange, apples, oranges):
    def count_fruits(home_start, home_end, tree, fruits): 
        count = 0
        for fruit in fruits:
            if home_start <= tree + fruit <= home_end:
                count +=1
        return count
    
    print(count_fruits(home_start, home_end, tree_apple, apples))
    print(count_fruits(home_start, home_end, tree_orange, oranges))

Answer 3

我想我会 sum() 几个列表理解作为起点：

apple_hits = sum(
    1 for apple in apples
    if house_min <= apple + apple_tree_origin <= house_max
)

虽然向我指出的一件事是从该测试的两侧减去 apple_tree_origin 不应该改变它：

apple_hits = sum(
    1 for apple in apples
    if house_min - apple_tree_origin <= apple <= house_max - apple_tree_origin
)

现在我们可能会观察到 house_min - apple_tree_origin 可以预先计算，其余的我的答案是：

def countApplesAndOranges1(s, t, a, b, apples, oranges):
    house_min = s
    house_max = t

    apple_tree_origin = a
    orange_tree_origin = b

    house_min_apples = house_min - apple_tree_origin
    house_max_apples = house_max - apple_tree_origin

    house_min_oranges = house_min - orange_tree_origin
    house_max_oranges = house_max - orange_tree_origin

    apple_hits = sum(1 for apple in apples if house_min_apples <= apple <= house_max_apples)
    orange_hits = sum(1 for orange in oranges if house_min_oranges <= orange <= house_max_oranges)
    return apple_hits, orange_hits

根据您提供的测试数据，我得到 (18409, 19582)。希望这是正确的。

随时timeit反对其他解决方案：

import timeit

setup = """
with open("apples_oranges.txt") as file_in:
    s,t = list(map(int, file_in.readline().split()))
    a,b = list(map(int, file_in.readline().split()))
    m,n = list(map(int, file_in.readline().split()))
    apples = list(map(int, file_in.readline().split()))
    oranges = list(map(int, file_in.readline().split()))

def countApplesAndOranges_jonsg(s, t, a, b, apples, oranges):
    house_min = s
    house_max = t

    apple_tree_origin = a
    orange_tree_origin = b

    house_min_apples = house_min - apple_tree_origin
    house_max_apples = house_max - apple_tree_origin

    house_min_oranges = house_min - orange_tree_origin
    house_max_oranges = house_max - orange_tree_origin

    apple_hits = sum(1 for apple in apples if house_min_apples <= apple <= house_max_apples)
    orange_hits = sum(1 for orange in oranges if house_min_oranges <= orange <= house_max_oranges)
    return apple_hits, orange_hits
"""

print(timeit.timeit("countApplesAndOranges_jonsg(s, t, a, b, apples, oranges)", setup=setup, number=1000))

Answer 4

谢谢大家！

我想通了，用了

if s <= apple + a <= t:

在我的代码中，因为它是摆脱嵌套循环的最简单解决方案。效果太好了，让我想知道我怎么没想到它。

我非常喜欢您的所有解决方案，感谢您的帮助！

优化 Python 中的函数以处理大量数据

Optimizing function in Python to handle a big chunk od data

python

algorithm

optimization

performance

function