遍历列表列表,维护列表结构
Iterating Through List of Lists, Maintaining List Structure
假设我有以下名单:
names = [['Matt', 'Matt', 'Paul'], ['Matt']]
我只想return列表中的“Matts”,但我还想维护列表的列表结构。所以我想 return:
[['Matt', 'Matt'], ['Matt']]
我有这样的东西,但这会将所有内容附加到一个大列表中:
matts = [name for namelist in names for name in namelist if name=="Matt"]
我知道这样的事情是可能的,但我想避免遍历列表和追加。这可能吗?
names = [['Matt', 'Matt', 'Paul'], ['Matt']]
matts = []
for namelist in names:
matts_namelist = []
for name in namelist:
if name=="Matt":
matts_namelist.append(name)
else:
pass
matts.append(matts_namelist)
使用nested list comprehension,如下:
names = [['Matt', 'Matt', 'Paul'], ['Matt']]
res = [[name for name in lst if name == "Matt"] for lst in names]
print(res)
输出
[['Matt', 'Matt'], ['Matt']]
上面的嵌套列表理解等价于下面的for循环:
res = []
for lst in names:
res.append([name for name in lst if name == "Matt"])
print(res)
使用 filter
and partial
的第三个替代功能替代方案是:
from operator import eq
from functools import partial
names = [['Matt', 'Matt', 'Paul'], ['Matt']]
eq_matt = partial(eq, "Matt")
res = [[*filter(eq_matt, lst)] for lst in names]
print(res)
微基准测试
%timeit [[*filter(eq_matt, lst)] for lst in names]
56.3 µs ± 519 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [[name for name in lst if "Matt" == name] for lst in names]
26.9 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
设置 (微基准)
import random
population = ["Matt", "James", "William", "Charles", "Paul", "John"]
names = [random.choices(population, k=10) for _ in range(50)]
完整基准
候选人
def nested_list_comprehension(names, needle="Matt"):
return [[name for name in lst if needle == name] for lst in names]
def functional_approach(names, needle="Matt"):
eq_matt = partial(eq, needle)
return [[*filter(eq_matt, lst)] for lst in names]
def count_approach(names, needle="Matt"):
return [[needle] * name.count(needle) for name in names]
情节
以上结果是针对包含 100 到 1000 个元素的列表获得的,其中每个元素都是从 14 个字符串(名称)中随机选择的 10 个字符串的列表。可以找到重现结果的代码 here。
从图中可以看出,性能最高的解决方案是来自 @rv.kvetch.
的
IIUC,您可以使用如下嵌套列表来执行此操作:
>>> names = [['Matt', 'Matt', 'Paul'], ['Matt']]
>>> [[name for name in lst_name if name=='Matt'] for lst_name in names]
[['Matt', 'Matt'], ['Matt']]
使用filter
函数-
matts = [list(filter(lambda x: x=='Matt', namelist)) for namelist in names]
使用 list.count
的替代方法:
>>> names = [['Matt', 'Matt', 'Paul'], [], ['Matt']]
>>> [name.count('Matt') * ['Matt'] for name in names]
[['Matt', 'Matt'], [], ['Matt']]
您也可以尝试 itertools.repeat
:
>>> import itertools
>>> [[*itertools.repeat('Matt', name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]
最后,正如@DaniMensejo 所建议的,您还可以在嵌套的 list
理解中使用 range
迭代器:
>>> [['Matt' for _ in range(name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]
假设我有以下名单:
names = [['Matt', 'Matt', 'Paul'], ['Matt']]
我只想return列表中的“Matts”,但我还想维护列表的列表结构。所以我想 return:
[['Matt', 'Matt'], ['Matt']]
我有这样的东西,但这会将所有内容附加到一个大列表中:
matts = [name for namelist in names for name in namelist if name=="Matt"]
我知道这样的事情是可能的,但我想避免遍历列表和追加。这可能吗?
names = [['Matt', 'Matt', 'Paul'], ['Matt']]
matts = []
for namelist in names:
matts_namelist = []
for name in namelist:
if name=="Matt":
matts_namelist.append(name)
else:
pass
matts.append(matts_namelist)
使用nested list comprehension,如下:
names = [['Matt', 'Matt', 'Paul'], ['Matt']]
res = [[name for name in lst if name == "Matt"] for lst in names]
print(res)
输出
[['Matt', 'Matt'], ['Matt']]
上面的嵌套列表理解等价于下面的for循环:
res = []
for lst in names:
res.append([name for name in lst if name == "Matt"])
print(res)
使用 filter
and partial
的第三个替代功能替代方案是:
from operator import eq
from functools import partial
names = [['Matt', 'Matt', 'Paul'], ['Matt']]
eq_matt = partial(eq, "Matt")
res = [[*filter(eq_matt, lst)] for lst in names]
print(res)
微基准测试
%timeit [[*filter(eq_matt, lst)] for lst in names]
56.3 µs ± 519 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [[name for name in lst if "Matt" == name] for lst in names]
26.9 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
设置 (微基准)
import random
population = ["Matt", "James", "William", "Charles", "Paul", "John"]
names = [random.choices(population, k=10) for _ in range(50)]
完整基准
候选人
def nested_list_comprehension(names, needle="Matt"):
return [[name for name in lst if needle == name] for lst in names]
def functional_approach(names, needle="Matt"):
eq_matt = partial(eq, needle)
return [[*filter(eq_matt, lst)] for lst in names]
def count_approach(names, needle="Matt"):
return [[needle] * name.count(needle) for name in names]
情节
以上结果是针对包含 100 到 1000 个元素的列表获得的,其中每个元素都是从 14 个字符串(名称)中随机选择的 10 个字符串的列表。可以找到重现结果的代码 here。 从图中可以看出,性能最高的解决方案是来自 @rv.kvetch.
的IIUC,您可以使用如下嵌套列表来执行此操作:
>>> names = [['Matt', 'Matt', 'Paul'], ['Matt']]
>>> [[name for name in lst_name if name=='Matt'] for lst_name in names]
[['Matt', 'Matt'], ['Matt']]
使用filter
函数-
matts = [list(filter(lambda x: x=='Matt', namelist)) for namelist in names]
使用 list.count
的替代方法:
>>> names = [['Matt', 'Matt', 'Paul'], [], ['Matt']]
>>> [name.count('Matt') * ['Matt'] for name in names]
[['Matt', 'Matt'], [], ['Matt']]
您也可以尝试 itertools.repeat
:
>>> import itertools
>>> [[*itertools.repeat('Matt', name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]
最后,正如@DaniMensejo 所建议的,您还可以在嵌套的 list
理解中使用 range
迭代器:
>>> [['Matt' for _ in range(name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]