生成器和文件

Generators and files

当我写的时候:

lines = (line.strip() for line in open('a_file'))

文件是立即打开还是仅在我开始使用生成器表达式时才访问文件系统?

马上打开。如果您使用不存在的文件名,您可以验证这一点(它会抛出一个异常,表明 Python 实际上试图立即打开它)。

您还可以使用提供更多反馈的函数来查看甚至在迭代生成器之前就执行了命令:

def somefunction(filename):
    print(filename)
    return open(filename)

lines = (line.strip() for line in somefunction('a_file'))  # prints

但是,如果您使用生成器函数而不是生成器表达式,则文件只会在您遍历文件时打开:

def somefunction(filename):
    print(filename)
    for line in open(filename):
        yield line.strip()

lines = somefunction('a_file')  # no print!

list(lines)                     # prints because list iterates over the generator function.

马上打开

示例:

def func():
    print('x')
    return [1, 2, 3]

g = (x for x in func())

输出:

x

函数需要return一个可迭代对象。 open() returns 一个可迭代的打开文件对象。 因此,文件将在您定义生成器表达式时打开。

open() 在构建生成器时立即调用,无论您何时或是否使用它。

相关规范是PEP-289:

Early Binding versus Late Binding

After much discussion, it was decided that the first (outermost) for-expression should be evaluated immediately and that the remaining expressions be evaluated when the generator is executed.

Asked to summarize the reasoning for binding the first expression, Guido offered [5]:

Consider sum(x for x in foo()). Now suppose there's a bug in foo() that raises an exception, and a bug in sum() that raises an exception before it starts iterating over its argument. Which exception would you expect to see? I'd be surprised if the one in sum() was raised rather the one in foo(), since the call to foo() is part of the argument to sum(), and I expect arguments to be processed before the function is called.

OTOH, in sum(bar(x) for x in foo()), where sum() and foo() are bugfree, but bar() raises an exception, we have no choice but to delay the call to bar() until sum() starts iterating -- that's part of the contract of generators. (They do nothing until their next() method is first called.)

请参阅该部分的其余部分以进一步讨论。