遍历列表列表,维护列表结构

Iterating Through List of Lists, Maintaining List Structure

假设我有以下名单:

names = [['Matt', 'Matt', 'Paul'], ['Matt']]

我只想return列表中的“Matts”,但我还想维护列表的列表结构。所以我想 return:

[['Matt', 'Matt'], ['Matt']]

我有这样的东西,但这会将所有内容附加到一个大列表中:

matts = [name for namelist in names for name in namelist if name=="Matt"]

我知道这样的事情是可能的,但我想避免遍历列表和追加。这可能吗?

names = [['Matt', 'Matt', 'Paul'], ['Matt']]
matts = []
for namelist in names:
    matts_namelist = []
    for name in namelist:
        if name=="Matt":
            matts_namelist.append(name)
        else:
            pass
    matts.append(matts_namelist)
        

使用nested list comprehension,如下:

names = [['Matt', 'Matt', 'Paul'], ['Matt']]
res = [[name for name in lst if name == "Matt"] for lst in names]
print(res)

输出

[['Matt', 'Matt'], ['Matt']]

上面的嵌套列表理解等价于下面的for循环:

res = []
for lst in names:
    res.append([name for name in lst if name == "Matt"])
print(res)

使用 filter and partial 的第三个替代功能替代方案是:

from operator import eq
from functools import partial

names = [['Matt', 'Matt', 'Paul'], ['Matt']]

eq_matt = partial(eq, "Matt")
res = [[*filter(eq_matt, lst)] for lst in names]
print(res)

微基准测试

%timeit [[*filter(eq_matt, lst)] for lst in names]
56.3 µs ± 519 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [[name for name in lst if "Matt" == name] for lst in names]
26.9 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

设置 (微基准)

import random
population = ["Matt", "James", "William", "Charles", "Paul", "John"]
names = [random.choices(population, k=10) for _ in range(50)]

完整基准

候选人

def nested_list_comprehension(names, needle="Matt"):
    return [[name for name in lst if needle == name] for lst in names]


def functional_approach(names, needle="Matt"):
    eq_matt = partial(eq, needle)
    return [[*filter(eq_matt, lst)] for lst in names]


def count_approach(names, needle="Matt"):
    return [[needle] * name.count(needle) for name in names]

情节

以上结果是针对包含 100 到 1000 个元素的列表获得的,其中每个元素都是从 14 个字符串(名称)中随机选择的 10 个字符串的列表。可以找到重现结果的代码 here。 从图中可以看出,性能最高的解决方案是来自 @rv.kvetch.

IIUC,您可以使用如下嵌套列表来执行此操作:

>>> names = [['Matt', 'Matt', 'Paul'], ['Matt']]
>>> [[name for name in lst_name if name=='Matt'] for lst_name in names]
[['Matt', 'Matt'], ['Matt']]

使用filter函数-

matts = [list(filter(lambda x: x=='Matt', namelist)) for namelist in names]

使用 list.count 的替代方法:

>>> names = [['Matt', 'Matt', 'Paul'], [], ['Matt']]
>>> [name.count('Matt') * ['Matt'] for name in names]
[['Matt', 'Matt'], [], ['Matt']]

您也可以尝试 itertools.repeat:

>>> import itertools
>>> [[*itertools.repeat('Matt', name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]

最后,正如@DaniMensejo 所建议的,您还可以在嵌套的 list 理解中使用 range 迭代器:

>>> [['Matt' for _ in range(name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]