Python all()/any() 之类的 portion/part 列表的方法？

Question

什么是最 elegant/pythonic 的实现方式："if x% of total values in a list are greater than the y, return true"。我目前实现了一个功能：

def check(listItems, val):
   '''A method to check all elements of a list against a given value.
   Returns true if all items of list are greater than value.'''
   return all(x>val for x in listItems)

但对于我的用例，等待这个特定条件的成本很高，而且有点无用。如果列表中约 80% 的项目大于给定值，我想继续。我想到的一种方法是按降序对列表进行排序，创建另一个列表并将列表的 80% 的元素复制到新列表，然后运行该新列表的函数。但是，我希望必须有一种更优雅的方式来做到这一点。有什么建议吗？

Answer 1

这个怎么样：

def check(listItems, val, threshold=0.8):
    return sum(x > val for x in listItems) > len(listItems) * threshold

它指出：如果 threshold%（默认为 0.80）的 listItems 中的元素大于 val，则 check 是 True .

Answer 2

按顺序勾选每一项。

如果您达到满意的程度，那么 return 尽早实现。
如果到了永远无法满足的地步，即使以后的每一项都通过了测试，那么return早点假。
否则继续（后面的内容会帮助你满足要求）

这与上面评论中的 FatihAkici 的想法相同，但进一步优化。

def check(list_items, ratio, val):
    passing = 0
    satisfied = ratio * len(list_items)
    for index, item in enumerate(list_items):
        if item > val:
            passing += 1
        if passing >= satisfied:
            return True
        remaining_items = len(list_items) - index - 1
        if passing + remaining_items < satisfied:
            return False

Answer 3

听起来您正在处理长列表，这就是成本高昂的原因。如果您能在满足条件后尽快退出，那就太好了。 any() 会执行此操作，但您需要避免在将其传递给 any() 之前阅读整个列表。一种选择可能是使用 itertools.accumulate 来保留运行总共 True 值并将其传递给任何值。类似于：

from itertools import accumulate

a = [1, 2, 2, 3, 4, 2, 4, 1, 1, 1]

# true if 50% are greater than 1
goal = .5 * len(a) # at least 5 out of 10   
any( x > goal for x in accumulate(n > 1 for n in a))

accumulate 不需要读取整个列表——它只会开始传递到那时看到的 True 值的数量。 any 应该 short-circuit 一旦找到真值，在上面的例子中是索引 5。

Answer 4

我不想将 Mark Meyer 的回答归功于他提出了使用 accumulate 和 any 的概念以及他们的更多 pythonic/readable，但是如果您正在寻找 "fastest" 方法然后修改他的方法使用 map 与使用理解更快。

any(map(goal.__le__, accumulate(map(val.__lt__, listItems))))

只是为了测试：

from timeit import timeit
from itertools import accumulate

def check1(listItems, val):
    goal = len(listItems)*0.8
    return any(x > goal for x in accumulate(n > val for n in listItems))

def check2(listItems, val):
    goal = len(listItems)*0.8
    return any(map(goal.__le__, accumulate(map(val.__lt__, listItems))))

items = [1, 2, 2, 3, 4, 2, 4, 1, 1, 1]

for t in (check1, check2):
    print(timeit(lambda: t(items, 1)))

结果是：

3.2596251670038328
2.0594907909980975

Answer 5

您可以为此使用 filter。到目前为止，这是最快的方法。参考我的另一个答案，因为这比那个方法更快。

def check(listItems, val, goal=0.8):
    return len((*filter(val.__lt__, listItems),)) >= len(listItems) * goal

此运行的测试结果时间以及我的其他问题中的方法是：

1.684135717988247

Python all()/any() 之类的 portion/part 列表的方法？

Python all()/any() like method for a portion/part of list?

python

loops

for-in-loop