可迭代的连续切片

Consecutive slices of iterable

假设我有一个迭代器

numbers = iter(range(100))

我想计算连续的平均值并将它们存储在 iterable 和元素

0., 0.5, ..., 49., 49.5

这可以通过将 iterable 转换为 list/tuple and counting its slices like

来完成
from statistics import mean

# in cases with large or potentially infinite amounts of data
# this conversion will fail
numbers_list = list(numbers)
numbers_slices = (numbers_list[:end + 1] for end in range(len(numbers_list)))
mean_values = map(mean, numbers_slices)

(有关 mean 函数的更多信息位于 docs

所以我的问题更笼统:有没有什么方法可以使用标准库获取可迭代的连续切片而不用 list/tuple 包装?


我们可以这样写效用函数

def get_slices(iterable):
    elements = []
    for element in iterable:
        elements.append(element)
        yield elements

然后

numbers_slices = get_slices(numbers)
mean_values = map(mean, numbers_slices)

但它看起来也很糟糕


P。 S.:我知道像

这样计算连续的平均值会更好
def get_mean_values(numbers):
    numbers_sum = 0
    for numbers_count, number in enumerate(numbers, start=1):
        numbers_sum += number
        yield numbers_sum / numbers_count

但这不是我要说的。

您可以拥有一个直接 yield 均值的生成器,其中局部变量包含 运行 总数和计数。 (实际上,您可以通过迭代 enumerate(iterable) 并将 1 添加到索引来免费获得计数。这足够提示了吗?

看看itertools.islice Link

import itertools
def get_slices(iterable):
    return map(lambda x: itertools.islice(iterable, x), xrange(len(iterable)))

如果你不知道长度,这里有一个缩减版本,内存效率非常低:

from functools import reduce
numbers = (number for number in range(1,100))
mean = lambda x, y: (x+y)/float(2)
reduce(lambda x, y: x + [mean(x[-1], y)], numbers, [0])
[0.0, 0.5, 1.25, 2.125, 3.0625, 4.03125, 5.015625, 6.0078125, 7.00390625, 8.001953125, 9.0009765625, 10.00048828125, 11.000244140625, 12.0001220703125, 13.00006103515625, 14.000030517578125, 15.000015258789062, 16.00000762939453, 17.000003814697266, 18.000001907348633, 19.000000953674316, 20.000000476837158, 21.00000023841858, 22.00000011920929, 23.000000059604645, 24.000000029802322, 25.00000001490116, 26.00000000745058, 27.00000000372529, 28.000000001862645, 29.000000000931323, 30.00000000046566, 31.00000000023283, 32.000000000116415, 33.00000000005821, 34.000000000029104, 35.00000000001455, 36.000000000007276, 37.00000000000364, 38.00000000000182, 39.00000000000091, 40.000000000000455, 41.00000000000023, 42.000000000000114, 43.00000000000006, 44.00000000000003, 45.000000000000014, 46.00000000000001, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0]

所以,最后我们所做的与您的代码几乎相同,所以您应该使用它,或者 使用列表而不是生成器,然后使用列表的切片(itertools.ilice).

编辑: 我一直在想这个问题,用 Haskell scanl 很容易解决,所以我将这个概念泛化并得到了很好的结果:

def scanl(f, g):
    n = next(g)
    yield n
    for e in g:
        n = f(n, e)
        yield n

list(scanl(mean, number))
[0, 0.5, 1.25, 2.125, 3.0625, 4.03125, 5.015625, 6.0078125, 7.00390625, 8.001953125, 9.0009765625, 10.00048828125, 11.000244140625, 12.0001220703125, 13.00006103515625, 14.000030517578125, 15.000015258789062, 16.00000762939453, 17.000003814697266, 18.000001907348633, 19.000000953674316, 20.000000476837158, 21.00000023841858, 22.00000011920929, 23.000000059604645, 24.000000029802322, 25.00000001490116, 26.00000000745058, 27.00000000372529, 28.000000001862645, 29.000000000931323, 30.00000000046566, 31.00000000023283, 32.000000000116415, 33.00000000005821, 34.000000000029104, 35.00000000001455, 36.000000000007276, 37.00000000000364, 38.00000000000182, 39.00000000000091, 40.000000000000455, 41.00000000000023, 42.000000000000114, 43.00000000000006, 44.00000000000003, 45.000000000000014, 46.00000000000001, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0]

似乎没有标准的方法来获取 iterable (iterator/list/tuple/etc)

的连续切片

我发现更好的方法是使用原始问题中稍微修改过的效用函数

def consecutive_slices(iterable):
    elements = []
    for element in iterable:
        elements.append(element)
        yield list(elements)

修改

  • 添加了 elements 的复制(顺便说一下,doing that 有很多方法),因为以前的版本是在 list

    中换行的
    >>> numbers_slices = list(get_slices(numbers))
    

    将给我们 listelementsN 重复,其中包含所有数字(N 在示例中等于 100):

    >>> numbers_slices == [list(range(100))] * 100
    True
    

函数方法

写了更多之后,我意识到这也可以使用 itertools module 来完成,比如

from itertools import (accumulate,
                       chain)


def consecutive_slices(iterable):
    def collect_elements(previous_elements, element):
        return previous_elements + [element]

    return accumulate(chain(([],), iterable), collect_elements)

这里我们使用 chain 作为初始切片在前面添加空 list,使用 islice like

可以在结果中忽略它
from itertools import islice
...
islice(consecutive_slices(range(10)), 1, None)

但将其保留为一个切片似乎是合法的,因为毕竟空切片也是一个切片。

与以前的解决方案相比,这仍然是 4 行代码函数,几乎可以完成相同的事情,但更少 "spaghetti" IMO。