如何根据字符打印 .txt 文件的各个部分

How to print sections of .txt file based on character

我正在编写一个从 .txt 文件中读取问题的程序。问题被破折号“-----”包围,问题后面还有四个可能的正确答案,然后是答案。我不知道如何将这些分开打印。该文件超过 10,000 行,我需要能够一次打印一个问题,但我不确定如何指定仅在破折号之间读取。

文本示例:

---------------------------------------------------------------------------
  #1010 How tall is the actor Verne Troyer, famous for his role as Mini-me
        in the Austin Powers films?
---------------------------------------------------------------------------
 *36 inches
 *32 inches
 *24 inches
 *35 inches

Answer: 32 inches

---------------------------------------------------------------------------
  #1011 Who auditioned for the role of James Bond in 1969 but was turned
        down for being too tall?
---------------------------------------------------------------------------
 *John Cleese
 *Peter Snow
 *Simon Dee
 *Christopher Lee

Answer: Peter Snow

这会将其分成块:

with open('myfile.txt', 'r') as f:
    data = f.read().split('-' * 5)

我需要更多细节或示例才能完成更多工作。

编辑(举例):

qa_dict = dict()
with open('myfile.txt', 'r') as f:
    spacer = '-' * 75
    last_q = ''
    #grab each block, one at a time, where a block is a q or a
    for block in f.read().split(spacer):
        #if block is a question
        if '#' in block and '?' in block:
            if last_q != '':  
                print('something went wrong around:', block)
            last_q = ' '.join(block.split())
        #if block is an answer
        elif last_q != '':
            #remove messy whitespace, throw choices and answer into an array
            choices_ans = list(filter(lambda let:len(let.strip()) > 0, block.split('\n')))
            #build a {question:(choices, answer)} dictionary
            qa_dict[last_q] = (choices_ans[:-1], choices_ans[-1])
            last_q = ''
        elif block.strip() != '':
            print('received answer without question', block)
#print out everything
for question in qa_dict.keys():
    choices, ans = qa_dict[question]
    print(question)
    print('\n'.join(choices))
    print(ans)

您可以使用正则表达式来解析文件的内容。

此代码做出以下假设:

  • 您正在使用 Python 3.6+ (f-strings) - 可以轻松修改代码以适应旧版本
  • 分隔符由 75 个破折号组成
  • 问题文本以一些空格为前缀,'#' 字符和一些数字,后跟一个空格
  • 答案由 '*' 字符分隔
  • 正确答案的前缀是字符串 "Answer: "
  • 至少一个换行符将正确答案和下一个分隔开 问题的分隔符。

即使您的文本文件中的单个问题定义不明确且不符合此格式,它也不会被正则表达式捕获或引发异常。

这段代码利用了命名捕获组,但这只是为了方便。我还选择使用 collections.namedtuple 来表示 Question 对象。我们遍历所有正则表达式匹配以生成 Question 个对象的列表:

def main():
    from collections import namedtuple
    import re as regex

    Question = namedtuple("Question", ["text", "answers", "correct_answer"])

    with open("questions.txt", "r") as file:
        content = file.read()

    sep = "-" * 75

    pattern = f"{sep}\s+#\d+\s(?P<question>.+?(?={sep})){sep}\s+(?P<answers>.+?(?=Answer: ))Answer: (?P<correct_answer>[^\n]+)"

    questions = []

    for result in regex.finditer(pattern, content, flags=regex.DOTALL):
        question_text = " ".join(result.group("question").split())
        answers = list(filter(None, map(str.strip, result.group("answers").split("*"))))
        correct_answer = result.group("correct_answer").strip()

        questions.append(Question(question_text, answers, correct_answer))

    print(questions)

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出:

[Question(text='How tall is the actor Verne Troyer, famous for his role as Mini-me in the Austin Powers films?', answers=['36 inches', '32 inches', '24 inches', '35 inches'], correct_answer='32 inches'), Question(text='Who auditioned for the role of James Bond in 1969 but was turned down for being too tall?', answers=['John Cleese', 'Peter Snow', 'Simon Dee', 'Christopher Lee'], correct_answer='Peter Snow')]

有了问题对象后,设置测验的其余部分就很简单了。例如,选择一个随机问题就像 question = random.choice(questions) 一样简单。也就是说,您最好使用 questions = random.sample(questions, k=10) 随机选择问题(在本例中为 10 个)。使用 random.choice 将允许多次选择同一个问题,尽管在包含很多问题的文件中这不太可能。