词法分析器的构造，while循环中的String Index Out of range，

Question

简介： 我正在学习如何使用自己的语言和必要的步骤来实现这一目标。我试图实现词法分析器，但即使我的逻辑是正确的，我也会收到错误消息。我希望程序不阅读评论。

问题： 当我尝试迭代单词并在注释行中查找“\n”时，出现错误“字符串索引超出范围”。

Python代码：

comment = ['//', '/*', '*/']
keyw = ["main", "void"]
br = ['(', ')', '{', '}']
lineCount = 1
temp = ''
flag = False
f = open('Program.C', 'r')
Program = f.read()
#print(Program)

for c in range(len(Program)):
    if Program[c] == ' ':
        continue
    if Program[c] == '\n':
        lineCount = lineCount + 1
        continue
    if Program[c] == '/':
        c = c + 1
        if Program[c] == '/':
            c = c + 1
            while Program[c] != '\n':
                c = c +1
    if Program[c] in br:
        print(lineCount, "Brackets", Program[c])
    else:
        temp = temp + Program[c]
        print(temp)
        if temp in keyw:
            print(lineCount, "Keyword", temp)
            temp = ''
    print(Program[c])

输出：

while Program[c] != '\n':
IndexError: string index out of range
 S
S
 Sa
a
 Saa
a
 Saad
d
Process finished with exit code 1

示例输入文件：

// Saad
// Bhai

Answer 1

除了回答您的实际问题，我还想给您一些改进 Python 代码的建议。

您的实际问题：您的第二行没有以 `\n`

结尾

您实际问题的答案是您的文件没有以换行符结尾 \n。

虽然很自然地假设每一行都如此，但也有一个例外：文件的最后一行。解析文件的第二行时，您的 while 循环一直在搜索 \n 字符，但没有找到任何字符，因为您的文件只是在 i.

之后结束

您可以通过打印出所有读取的字符来确认这一点：

>>> f = open('Program.C')
>>> print(list(f.read()))
['/', '/', ' ', 'S', 'a', 'a', 'd', '\n', '/', '/', ' ', 'B', 'h', 'a', 'i']
                                    ^^^^                                    ^^^^
                               endline here                         but not here!

因此，您的 while 循环没有找到 \n 字符，而是您的变量 c 增加到超出文件输入的长度，导致 IndexError: string index out of range你遇到了。

简单的解决方法是将 while 循环更改为

while c < len(Program) and Program[c] != '\n':

改善您的 Python

命名约定

大写字母开头的名字通常是为类保留的，所以Program应该是program。 CamelCase 通常也被避免，所以 lineCount 变成 line_count

打开 Python 中的文件：`with open(file) as f:`

当您 open 自己在 Python 中创建文件时，您也应该 close 它。因为这很烦人，所以 Python 有 with 语句，一旦你离开

就会自动关闭它

with open(filename) as f:
    # file I/O

# file itself no longer needed

`for`-在Python

中循环

Python 中的任何类似序列的类型都具有内置的迭代支持。无需手动索引，您可以直接访问所需的项目。比较 my_list = [1, 4, 9]：

for i in range(len(my_list)):
    print(my_list[i])

和

for item in my_list:
    print(item)

如果还需要索引，可以使用enumerate:

for i, item in enumerate(my_list):
    print(i, item)

遍历文件

除了读取文件并遍历字符串中的每个字符，Python还支持逐行遍历文件：

with open(filename) as file:
    # making use of enumerate()
    for line_num, line in enumerate(file, start=1):
        print(line_num, line)

我的版本

这就是我对您发布的代码所做的，尽管随着您进一步分析，这可能不是最好的解决方案（实际上可能不是）。作为您发布代码的 'pythonic' 版本，它可能仍然是一个有用的参考。

with open('program.C', 'r') as file:
    for line_count, line in enumerate(file, start=1):
        line = line.lstrip(' ')

        if line.startswith('//'):
            continue

        for character in line.rstrip('/n'):
            if character in br:
                print(line_count, "Brackers", character)
            else:
                temp += character
                print(temp)
                if temp in keywords:
                    print(line_count, "Keyword", temp)
                    temp = ''

词法分析器的构造，while循环中的String Index Out of range，

Construction of Lexical Analyzer, String Index Out Of Range in While Loop,

compiler-construction

lexical-analysis

python-3.x

您的实际问题：您的第二行没有以 `\n`

改善您的 Python

命名约定

打开 Python 中的文件：`with open(file) as f:`

`for`-在Python

遍历文件

我的版本

词法分析器的构造，while循环中的String Index Out of range，

Construction of Lexical Analyzer, String Index Out Of Range in While Loop,

compiler-construction

lexical-analysis

python-3.x

您的实际问题：您的第二行没有以 \n

改善您的 Python

命名约定

打开 Python 中的文件：with open(file) as f:

for-在Python

遍历文件

我的版本

您的实际问题：您的第二行没有以 `\n`

打开 Python 中的文件：`with open(file) as f:`

`for`-在Python