更新字典时加快嵌套 Python 循环
Speed up a nested Python loop while updating a dictionary
我有以下 Python 嵌套循环并试图减少其执行时间。我尝试了一些优化,但帮助不大。我想知道是否有人可以提供一些提示,或者是否有任何 Pythonic 方式等
def(input_list, A, B, threshold):
a_dict = {}
idx = 0
for sc, nb in zip(A, B):
b_dict = {}
for s, n in zip(sc, nb):
if s >= threshold:
b_dict.update(init_dict(n, s))
a_dict[input_list[idx]] = b_dict
idx += 1
return a_dict
A和B都是numpy.ndarray
。
例如,我尝试的优化之一是避免对 init_dict(n,s) 的函数调用并直接更新 b_dict 而无需调用函数和创建另一个字典在里面,return 它然后更新 b_dict,这有点帮助。但是还有更多的优化来避免两个循环,例如使用多处理或线程吗?
A是这样的:
[[0.8921996 0.91602445 0.92908716 0.9417222 0.96200365]
[0.4753568 0.6385271 0.6559716 0.67830306 0.7077361 ]
[0.700236 0.75287104 0.7589616 0.7638799 0.77096677]
....
]
B 是:
[[682506892 693571174 668887658 303551993 27694382]
[ 15028940 14862639 54801234 14711873 15136693]
[567664619 217092797 399261625 124879790 349055820]
....
]
returned 值 (a_dict) 是这样的:
{
'147840198': {
'567664619': 0.7002360224723816, '217092797': 0.752871036529541,
'399261625': 0.7589616179466248, '124879790': 0.7638798952102661,
'349055820': 0.7709667682647705
},
'485045174': {
'627320584': 0.24876028299331665, '297801439': 0.3101433217525482,
'166126424': 0.3392677307128906, '579653715': 0.3781401515007019,
'880315906': 0.40654435753822327
},
'39703998': {
'273891679': 0.667972981929779, '972073794': 0.8249127864837646,
'17236820': 0.8573702573776245, '675493278': 0.8575121164321899,
'163042687': 0.8683345317840576
},
'55375077': {
'14914733': 0.7121858596801758, '28645587': 0.7306985259056091,
'14914719': 0.7347514629364014, '15991986': 0.7463902831077576,
'14914756': 0.7500130534172058
},
.....
}
_init_dict(n,s)
是一个函数,它分别获取 n 和 s 作为键和值,而 returns 是一个字典。正如我之前提到的,不需要该步骤,我们可以直接使用 n 和 s 作为 b_dict.
的键值对
threshold
可以是 0 到 1 之间的数字,input_list
是字符串列表,例如:
['147840198', '485045174', '39703998', '55375077', ....]
好的,鉴于 A 中的子列表已排序,这很快就会崩溃。每当您在排序列表中寻找阈值时,循环都是一个 BAD 想法。二分搜索通常是首选武器。
这是您的代码的几个(越来越好)变体。 chopper3()
通过字典理解将其简化为 1-liner
from bisect import bisect_left
def chopper(output_keys, A, B, threshold):
a_dict = {}
for idx, (sc, nb) in enumerate(zip(A, B)):
b_dict = {}
chop_idx = bisect_left(sc, threshold)
a_dict[output_keys[idx]] = {k:v for k,v in zip(nb[chop_idx:], sc[chop_idx:])}
return a_dict
def chopper2(output_keys, A, B, threshold):
chop_idx = [bisect_left(a, threshold) for a in A]
res = {output_key: dict(zip(k[chop_idx:], v[chop_idx:])) for
output_key, v, k, chop_idx in zip(output_keys, A, B, chop_idx)}
return res
def chopper3(output_keys, A, B, threshold):
return {output_key: dict(zip(k[chop_idx:], v[chop_idx:]))
for output_key, v, k in zip(output_keys, A, B)
for chop_idx in (bisect_left(v, threshold),)}
A = [ [0.50, 0.55, 0.70, 0.80],
[0.61, 0.71, 0.81, 0.91],
[0.40, 0.41, 0.42, 0.43]]
B = [ [123, 456, 789, 1011],
[202, 505, 30, 400],
[90, 80, 70, 600]]
output_keys = list('ABC')
print (chopper(output_keys, A, B, 0.55))
print (chopper2(output_keys, A, B, 0.55))
print (chopper3(output_keys, A, B, 0.55))
产量:
{'A': {456: 0.55, 789: 0.7, 1011: 0.8}, 'B': {202: 0.61, 505: 0.71, 30: 0.81, 400: 0.91}, 'C': {}}
{'A': {456: 0.55, 789: 0.7, 1011: 0.8}, 'B': {202: 0.61, 505: 0.71, 30: 0.81, 400: 0.91}, 'C': {}}
{'A': {456: 0.55, 789: 0.7, 1011: 0.8}, 'B': {202: 0.61, 505: 0.71, 30: 0.81, 400: 0.91}, 'C': {}}
[Finished in 0.0s]
我有以下 Python 嵌套循环并试图减少其执行时间。我尝试了一些优化,但帮助不大。我想知道是否有人可以提供一些提示,或者是否有任何 Pythonic 方式等
def(input_list, A, B, threshold):
a_dict = {}
idx = 0
for sc, nb in zip(A, B):
b_dict = {}
for s, n in zip(sc, nb):
if s >= threshold:
b_dict.update(init_dict(n, s))
a_dict[input_list[idx]] = b_dict
idx += 1
return a_dict
A和B都是numpy.ndarray
。
例如,我尝试的优化之一是避免对 init_dict(n,s) 的函数调用并直接更新 b_dict 而无需调用函数和创建另一个字典在里面,return 它然后更新 b_dict,这有点帮助。但是还有更多的优化来避免两个循环,例如使用多处理或线程吗?
A是这样的:
[[0.8921996 0.91602445 0.92908716 0.9417222 0.96200365]
[0.4753568 0.6385271 0.6559716 0.67830306 0.7077361 ]
[0.700236 0.75287104 0.7589616 0.7638799 0.77096677]
....
]
B 是:
[[682506892 693571174 668887658 303551993 27694382]
[ 15028940 14862639 54801234 14711873 15136693]
[567664619 217092797 399261625 124879790 349055820]
....
]
returned 值 (a_dict) 是这样的:
{
'147840198': {
'567664619': 0.7002360224723816, '217092797': 0.752871036529541,
'399261625': 0.7589616179466248, '124879790': 0.7638798952102661,
'349055820': 0.7709667682647705
},
'485045174': {
'627320584': 0.24876028299331665, '297801439': 0.3101433217525482,
'166126424': 0.3392677307128906, '579653715': 0.3781401515007019,
'880315906': 0.40654435753822327
},
'39703998': {
'273891679': 0.667972981929779, '972073794': 0.8249127864837646,
'17236820': 0.8573702573776245, '675493278': 0.8575121164321899,
'163042687': 0.8683345317840576
},
'55375077': {
'14914733': 0.7121858596801758, '28645587': 0.7306985259056091,
'14914719': 0.7347514629364014, '15991986': 0.7463902831077576,
'14914756': 0.7500130534172058
},
.....
}
_init_dict(n,s)
是一个函数,它分别获取 n 和 s 作为键和值,而 returns 是一个字典。正如我之前提到的,不需要该步骤,我们可以直接使用 n 和 s 作为 b_dict.
threshold
可以是 0 到 1 之间的数字,input_list
是字符串列表,例如:
['147840198', '485045174', '39703998', '55375077', ....]
好的,鉴于 A 中的子列表已排序,这很快就会崩溃。每当您在排序列表中寻找阈值时,循环都是一个 BAD 想法。二分搜索通常是首选武器。
这是您的代码的几个(越来越好)变体。 chopper3()
通过字典理解将其简化为 1-liner
from bisect import bisect_left
def chopper(output_keys, A, B, threshold):
a_dict = {}
for idx, (sc, nb) in enumerate(zip(A, B)):
b_dict = {}
chop_idx = bisect_left(sc, threshold)
a_dict[output_keys[idx]] = {k:v for k,v in zip(nb[chop_idx:], sc[chop_idx:])}
return a_dict
def chopper2(output_keys, A, B, threshold):
chop_idx = [bisect_left(a, threshold) for a in A]
res = {output_key: dict(zip(k[chop_idx:], v[chop_idx:])) for
output_key, v, k, chop_idx in zip(output_keys, A, B, chop_idx)}
return res
def chopper3(output_keys, A, B, threshold):
return {output_key: dict(zip(k[chop_idx:], v[chop_idx:]))
for output_key, v, k in zip(output_keys, A, B)
for chop_idx in (bisect_left(v, threshold),)}
A = [ [0.50, 0.55, 0.70, 0.80],
[0.61, 0.71, 0.81, 0.91],
[0.40, 0.41, 0.42, 0.43]]
B = [ [123, 456, 789, 1011],
[202, 505, 30, 400],
[90, 80, 70, 600]]
output_keys = list('ABC')
print (chopper(output_keys, A, B, 0.55))
print (chopper2(output_keys, A, B, 0.55))
print (chopper3(output_keys, A, B, 0.55))
产量:
{'A': {456: 0.55, 789: 0.7, 1011: 0.8}, 'B': {202: 0.61, 505: 0.71, 30: 0.81, 400: 0.91}, 'C': {}}
{'A': {456: 0.55, 789: 0.7, 1011: 0.8}, 'B': {202: 0.61, 505: 0.71, 30: 0.81, 400: 0.91}, 'C': {}}
{'A': {456: 0.55, 789: 0.7, 1011: 0.8}, 'B': {202: 0.61, 505: 0.71, 30: 0.81, 400: 0.91}, 'C': {}}
[Finished in 0.0s]