优化 Python 中的函数以处理大量数据
Optimizing function in Python to handle a big chunk od data
我尝试在 HackerRank 上解决一个名为 'Apple and orange' 的问题,代码如下:
def countApplesAndOranges(s, t, a, b, apples, oranges):
count_apples = 0
count_oranges = 0
x = [x for x in range(s, t+1)]
pos_apple = [apple + a for apple in apples]
pos_orange = [orange + b for orange in oranges]
for i in x:
for j in pos_apple:
if j == i:
count_apples +=1
for l in pos_orange:
if l == i:
count_oranges += 1
print(count_apples)
print(count_oranges)
代码有效。
但是,当我尝试提交它时,它通过了前 3 次测试并没有通过其余的测试,但 'Terminated due to timeout' 除外。我检查了其中一项测试的输入,需要处理大量数据,您可以在此处查看数据:
https://hr-testcases-us-east-1.s3.amazonaws.com/25220/input03.txt?AWSAccessKeyId=AKIAR6O7GJNX5DNFO3PV&Expires=1642016820&Signature=J4ypdP0YzRxcOWp%2By5XaD5ITeMw%3D&response-content-type=text%2Fplain
它失败了,因为通过我的 IDE 处理具有相同输入的代码大约需要 2 分钟,但 HackerRank 测试限制为 10 秒。
我需要你的帮助来优化代码并使其更快 运行。
嵌套循环似乎是这里最大的问题,但我不知道应该用什么替换。
根据范围内的每个坐标检查每个 apple/orange 将代码的 运行 时间复杂度变为 O(n * a + n * o)
,其中 n
是房子的长度,a
是苹果的数量, o
是橙子的数量。理想情况下,您的代码必须 运行 in O(a + o)
.
这是您的解决方案的重构版本:
def countApplesAndOranges(s, t, a, b, apples, oranges):
count_apples = 0
count_oranges = 0
for apple in pos_apple:
if s <= apple + a <= t:
count_apples +=1
for orange in pos_orange:
if s <= orange + b <= t:
count_oranges += 1
print(count_apples)
print(count_oranges)
如果可以,不要建立中间列表。如果确实需要(这个问题不是这种情况),请改用生成器。
尽量不要重复自己,尽可能用函数分解。
def countApplesAndOranges(home_start, home_end, tree_apple, tree_orange, apples, oranges):
def count_fruits(home_start, home_end, tree, fruits):
count = 0
for fruit in fruits:
if home_start <= tree + fruit <= home_end:
count +=1
return count
print(count_fruits(home_start, home_end, tree_apple, apples))
print(count_fruits(home_start, home_end, tree_orange, oranges))
我想我会 sum()
几个列表理解作为起点:
apple_hits = sum(
1 for apple in apples
if house_min <= apple + apple_tree_origin <= house_max
)
虽然向我指出的一件事是从该测试的两侧减去 apple_tree_origin 不应该改变它:
apple_hits = sum(
1 for apple in apples
if house_min - apple_tree_origin <= apple <= house_max - apple_tree_origin
)
现在我们可能会观察到 house_min - apple_tree_origin
可以预先计算,其余的我的答案是:
def countApplesAndOranges1(s, t, a, b, apples, oranges):
house_min = s
house_max = t
apple_tree_origin = a
orange_tree_origin = b
house_min_apples = house_min - apple_tree_origin
house_max_apples = house_max - apple_tree_origin
house_min_oranges = house_min - orange_tree_origin
house_max_oranges = house_max - orange_tree_origin
apple_hits = sum(1 for apple in apples if house_min_apples <= apple <= house_max_apples)
orange_hits = sum(1 for orange in oranges if house_min_oranges <= orange <= house_max_oranges)
return apple_hits, orange_hits
根据您提供的测试数据,我得到 (18409, 19582)
。希望这是正确的。
随时timeit
反对其他解决方案:
import timeit
setup = """
with open("apples_oranges.txt") as file_in:
s,t = list(map(int, file_in.readline().split()))
a,b = list(map(int, file_in.readline().split()))
m,n = list(map(int, file_in.readline().split()))
apples = list(map(int, file_in.readline().split()))
oranges = list(map(int, file_in.readline().split()))
def countApplesAndOranges_jonsg(s, t, a, b, apples, oranges):
house_min = s
house_max = t
apple_tree_origin = a
orange_tree_origin = b
house_min_apples = house_min - apple_tree_origin
house_max_apples = house_max - apple_tree_origin
house_min_oranges = house_min - orange_tree_origin
house_max_oranges = house_max - orange_tree_origin
apple_hits = sum(1 for apple in apples if house_min_apples <= apple <= house_max_apples)
orange_hits = sum(1 for orange in oranges if house_min_oranges <= orange <= house_max_oranges)
return apple_hits, orange_hits
"""
print(timeit.timeit("countApplesAndOranges_jonsg(s, t, a, b, apples, oranges)", setup=setup, number=1000))
谢谢大家!
我想通了,用了
if s <= apple + a <= t:
在我的代码中,因为它是摆脱嵌套循环的最简单解决方案。效果太好了,让我想知道我怎么没想到它。
我非常喜欢您的所有解决方案,感谢您的帮助!
我尝试在 HackerRank 上解决一个名为 'Apple and orange' 的问题,代码如下:
def countApplesAndOranges(s, t, a, b, apples, oranges):
count_apples = 0
count_oranges = 0
x = [x for x in range(s, t+1)]
pos_apple = [apple + a for apple in apples]
pos_orange = [orange + b for orange in oranges]
for i in x:
for j in pos_apple:
if j == i:
count_apples +=1
for l in pos_orange:
if l == i:
count_oranges += 1
print(count_apples)
print(count_oranges)
代码有效。 但是,当我尝试提交它时,它通过了前 3 次测试并没有通过其余的测试,但 'Terminated due to timeout' 除外。我检查了其中一项测试的输入,需要处理大量数据,您可以在此处查看数据: https://hr-testcases-us-east-1.s3.amazonaws.com/25220/input03.txt?AWSAccessKeyId=AKIAR6O7GJNX5DNFO3PV&Expires=1642016820&Signature=J4ypdP0YzRxcOWp%2By5XaD5ITeMw%3D&response-content-type=text%2Fplain
它失败了,因为通过我的 IDE 处理具有相同输入的代码大约需要 2 分钟,但 HackerRank 测试限制为 10 秒。
我需要你的帮助来优化代码并使其更快 运行。
嵌套循环似乎是这里最大的问题,但我不知道应该用什么替换。
根据范围内的每个坐标检查每个 apple/orange 将代码的 运行 时间复杂度变为 O(n * a + n * o)
,其中 n
是房子的长度,a
是苹果的数量, o
是橙子的数量。理想情况下,您的代码必须 运行 in O(a + o)
.
这是您的解决方案的重构版本:
def countApplesAndOranges(s, t, a, b, apples, oranges):
count_apples = 0
count_oranges = 0
for apple in pos_apple:
if s <= apple + a <= t:
count_apples +=1
for orange in pos_orange:
if s <= orange + b <= t:
count_oranges += 1
print(count_apples)
print(count_oranges)
如果可以,不要建立中间列表。如果确实需要(这个问题不是这种情况),请改用生成器。
尽量不要重复自己,尽可能用函数分解。
def countApplesAndOranges(home_start, home_end, tree_apple, tree_orange, apples, oranges):
def count_fruits(home_start, home_end, tree, fruits):
count = 0
for fruit in fruits:
if home_start <= tree + fruit <= home_end:
count +=1
return count
print(count_fruits(home_start, home_end, tree_apple, apples))
print(count_fruits(home_start, home_end, tree_orange, oranges))
我想我会 sum()
几个列表理解作为起点:
apple_hits = sum(
1 for apple in apples
if house_min <= apple + apple_tree_origin <= house_max
)
虽然向我指出的一件事是从该测试的两侧减去 apple_tree_origin 不应该改变它:
apple_hits = sum(
1 for apple in apples
if house_min - apple_tree_origin <= apple <= house_max - apple_tree_origin
)
现在我们可能会观察到 house_min - apple_tree_origin
可以预先计算,其余的我的答案是:
def countApplesAndOranges1(s, t, a, b, apples, oranges):
house_min = s
house_max = t
apple_tree_origin = a
orange_tree_origin = b
house_min_apples = house_min - apple_tree_origin
house_max_apples = house_max - apple_tree_origin
house_min_oranges = house_min - orange_tree_origin
house_max_oranges = house_max - orange_tree_origin
apple_hits = sum(1 for apple in apples if house_min_apples <= apple <= house_max_apples)
orange_hits = sum(1 for orange in oranges if house_min_oranges <= orange <= house_max_oranges)
return apple_hits, orange_hits
根据您提供的测试数据,我得到 (18409, 19582)
。希望这是正确的。
随时timeit
反对其他解决方案:
import timeit
setup = """
with open("apples_oranges.txt") as file_in:
s,t = list(map(int, file_in.readline().split()))
a,b = list(map(int, file_in.readline().split()))
m,n = list(map(int, file_in.readline().split()))
apples = list(map(int, file_in.readline().split()))
oranges = list(map(int, file_in.readline().split()))
def countApplesAndOranges_jonsg(s, t, a, b, apples, oranges):
house_min = s
house_max = t
apple_tree_origin = a
orange_tree_origin = b
house_min_apples = house_min - apple_tree_origin
house_max_apples = house_max - apple_tree_origin
house_min_oranges = house_min - orange_tree_origin
house_max_oranges = house_max - orange_tree_origin
apple_hits = sum(1 for apple in apples if house_min_apples <= apple <= house_max_apples)
orange_hits = sum(1 for orange in oranges if house_min_oranges <= orange <= house_max_oranges)
return apple_hits, orange_hits
"""
print(timeit.timeit("countApplesAndOranges_jonsg(s, t, a, b, apples, oranges)", setup=setup, number=1000))
谢谢大家!
我想通了,用了
if s <= apple + a <= t:
在我的代码中,因为它是摆脱嵌套循环的最简单解决方案。效果太好了,让我想知道我怎么没想到它。
我非常喜欢您的所有解决方案,感谢您的帮助!