为什么 itertools.product 运行在初始化时遍历所有元素？

Question

我假设 itertools.product 一次生成一个元素。我现在注意到这不是真的。简单的概念证明：

Class A:
  def __init__(self, n):
    self.source = iter(range(n))

  def __iter__(self):
    return self

  def __next__(self):
    val = next(self.source)
    print("I am at:", val)
    return val

现在如果我这样做：

from itertools import product
l = product(A(3), A(3))
print("Here")
next(l)

我希望输出：

>'Here'
>'I am at 0'
>'I am at 0'

但是我有

>'I am at 0'
>'I am at 1'
>'I am at 2'
>'I am at 0'
>'I am at 1'
>'I am at 2'
>'Here'

我是不是漏掉了什么？

Answer 1

要回答您的问题，我们需要查看 itertools.product:

的实现

def product(*args, repeat=1):
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

here你找到了真正的C实现，但是要回答这个问题，参考python就足够了（见底部的EXTRA段落）。

关注这行代码：

pools = [tuple(pool) for pool in args] * repeat

这样两个迭代器的所有元素（取入input）都转化为一个元组列表（只有第一次调用next()），这时候才真正创建。

回到您的代码，当您第一次调用 next(l) 时，将创建迭代器的所有元素。在您的示例中，将创建包含以下元素的 polls 列表：

# pools: [(0, 1, 2), (0, 1, 2)]

这就是你得到这些输出的原因。

至于 print("Here")，要了解为什么要先打印它，您需要了解生成器的工作原理：

itertool.product() returns 生成器对象。生成器直到被第一个next()刺激后才会执行函数代码。随后，每次调用 next() 允许您计算下一个元素，只执行一次包含关键字 yield.

的循环

Here 你会找到很好的资源来更好地理解 python 生成器的工作原理。

为什么 'itertools' 选择将元组列表保存在内存中？

因为笛卡尔积必须多次计算同一个元素，而迭代器不能只使用一次。

额外

在 C 中创建的元组列表 pools 等同于 python，正如您从这段代码中看到的那样，它们是急切求值的。每个可迭代参数首先转换为一个元组：

pools = PyTuple_New(npools);
if (pools == NULL)
    goto error;

for (i=0; i < nargs ; ++i) {
    PyObject *item = PyTuple_GET_ITEM(args, i);
    PyObject *pool = PySequence_Tuple(item);
    if (pool == NULL)
        goto error;
    PyTuple_SET_ITEM(pools, i, pool);
    indices[i] = 0;
}
for ( ; i < npools; ++i) {
    PyObject *pool = PyTuple_GET_ITEM(pools, i - nargs);
    Py_INCREF(pool);
    PyTuple_SET_ITEM(pools, i, pool);
    indices[i] = 0;
}

Answer 2

我想指出的是，虽然 class A 的两个实例都会详尽地调用 __next__ 方法（直到遇到 StopIteration），itertools.product对 next 的后续调用仍然对迭代器进行惰性求值。请注意：

> 'I am at 0'

> 'I am at 1'

> 'I am at 2'

> 'I am at 0'

> 'I am at 1'

> 'I am at 2'

> 'Here'

只是首先为第一个传递的实例彻底调用 next，然后为第二个实例调用的结果。这在调用 product(A(2), A(3)) 时更容易看到，结果是：

> 'I am at 0'

> 'I am at 1'

> 'I am at 0'

> 'I am at 1'

> 'I am at 2'

combinations 和 permutations 观察到相同的行为。事实上，用“Does itertools.product evaluate its arguments lazily?”搜索如此明智的问题带我到这也回答了你的问题。参数不会被延迟评估：

since product sometimes needs to go over an iterable more than once, which is not possible if the arguments were left as iterators that can only be consumed once.

为什么 itertools.product 运行在初始化时遍历所有元素？

Why does itertools.product run through all elements at initialization?

iteration

generator

python-3.7

为什么 'itertools' 选择将元组列表保存在内存中？

额外

为什么 itertools.product 运行 在初始化时遍历所有元素？

Why does itertools.product run through all elements at initialization?

iteration

generator

python-3.7

为什么 'itertools' 选择将元组列表保存在内存中？

额外

为什么 itertools.product 运行在初始化时遍历所有元素？