python 中 DEDENT 令牌究竟是如何生成的?

How exactly a DEDENT token is generated in python?

我正在阅读有关 lexical analysis of python 的文档,其中描述了 INDENT 和 DEDENT 令牌是如何使用的课程generated.I post 此处的描述。

The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a stack, as follows.

Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line’s indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.

我试图理解 DEDENT 部分,但未能理解,有人能给出比引用更好的解释吗?

假设我们有一个源文件,每个缩进级别使用 4 spaces,并且我们目前处于第三级缩进。缩进堆栈的内容将是 [0, 4, 8, 12] - 初始零,加上首次遇到的每个新缩进级别。现在,考虑下一行代码中前导 space 的数量...

  • 如果它是 12(匹配当前栈顶),则没有缩进变化,没有什么特别的事情发生。
  • 如果大于 12,则生成 INDENT 标记,并将新值添加到堆栈。
  • 如果是 8,则生成一个 DEDENT 令牌,然后从堆栈弹出 12。
  • 如果是 4,您将得到两个 DEDENT,并且 12 和 8 都会弹出。
  • 如果它是 0,或者源文件在此时结束,您将得到三个 DEDENT,并弹出 12、8、4。
  • 如果它是小于 12 的任何其他值,则会生成 "inconsistent indentation" 错误,因为无法判断您已缩减到前一级代码。

请注意,仅考虑包含实际代码的行 - 如果一行仅包含白色space 或注释,则其前导 space 的数量无关紧要。

这个过程的重点是恰好生成一个 DEDENT 对应于每个 INDENT,发生在缩进级别 returns 到(或低于)相应 INDENT 之前存在的数量的点.

由于 Python 有时比英语更容易,这里是对 Python 的粗略翻译。你可以看到真实世界的解析器(我自己写的)是这样工作的 here.

import re
code = """
for i in range(10):
   if i % 2 == 0:
     print(i)
   print("Next number")
print("That's all")

for i in range(10):
   if i % 2 == 0:
       print(i)
print("That's all again)

for i in range(10):
   if i % 2 == 0:
      print(i)
  print("That's all")
"""
def get_indent(s) -> int:
    m = re.match(r' *', s)
    return len(m.group(0))
def add_token(token):
    print(token)
INDENT="indent"
DEDENT="dedent"
indent_stack = [0]
# Before the first line of the file is read, a single zero is pushed on the stack
for line in code.splitlines():
    print("processing line:", line)
    indent = get_indent(line)
    # At the beginning of each logical line, the line’s 
    # indentation level is compared to the top of the stack. 
    if indent > indent_stack[-1]:
        # If it is larger, it is pushed on the stack, 
        # and one INDENT token is generated.
        add_token(INDENT)
        indent_stack.append(indent)
    elif indent < indent_stack[-1]:
        while indent < indent_stack[-1]:
            #  If it is smaller, ...
            # all numbers on the stack that are larger are popped off,
            # and for each number popped off a DEDENT token is generated.
            add_token(DEDENT)
            indent_stack.pop()
        if indent != indent_stack[-1]:
            # it must be one of the numbers occurring on the stack; 
            raise IndentationError
while indent_stack[-1]>0:
     # At the end of the file, a DEDENT token is generated for each number 
     # remaining on the stack that is larger than zero.
     add_token(DEDENT)
     indent_stack.pop()

这是输出:

processing line: 
processing line: for i in range(10):
processing line:    if i % 2 == 0:
indent
processing line:      print(i)
indent
processing line:    print("Next number")
dedent
processing line: print("That's all")
dedent
processing line: 
processing line: for i in range(10):
processing line:    if i % 2 == 0:
indent
processing line:        print(i)
indent
processing line: print("That's all again)
dedent
dedent
processing line: 
processing line: for i in range(10):
processing line:    if i % 2 == 0:
indent
processing line:       print(i)
indent
processing line:   print("That's all")
dedent
dedent
  File "<string>", line unknown
IndentationError