如何根据字符打印 .txt 文件的各个部分
How to print sections of .txt file based on character
我正在编写一个从 .txt 文件中读取问题的程序。问题被破折号“-----”包围,问题后面还有四个可能的正确答案,然后是答案。我不知道如何将这些分开打印。该文件超过 10,000 行,我需要能够一次打印一个问题,但我不确定如何指定仅在破折号之间读取。
文本示例:
---------------------------------------------------------------------------
#1010 How tall is the actor Verne Troyer, famous for his role as Mini-me
in the Austin Powers films?
---------------------------------------------------------------------------
*36 inches
*32 inches
*24 inches
*35 inches
Answer: 32 inches
---------------------------------------------------------------------------
#1011 Who auditioned for the role of James Bond in 1969 but was turned
down for being too tall?
---------------------------------------------------------------------------
*John Cleese
*Peter Snow
*Simon Dee
*Christopher Lee
Answer: Peter Snow
这会将其分成块:
with open('myfile.txt', 'r') as f:
data = f.read().split('-' * 5)
我需要更多细节或示例才能完成更多工作。
编辑(举例):
qa_dict = dict()
with open('myfile.txt', 'r') as f:
spacer = '-' * 75
last_q = ''
#grab each block, one at a time, where a block is a q or a
for block in f.read().split(spacer):
#if block is a question
if '#' in block and '?' in block:
if last_q != '':
print('something went wrong around:', block)
last_q = ' '.join(block.split())
#if block is an answer
elif last_q != '':
#remove messy whitespace, throw choices and answer into an array
choices_ans = list(filter(lambda let:len(let.strip()) > 0, block.split('\n')))
#build a {question:(choices, answer)} dictionary
qa_dict[last_q] = (choices_ans[:-1], choices_ans[-1])
last_q = ''
elif block.strip() != '':
print('received answer without question', block)
#print out everything
for question in qa_dict.keys():
choices, ans = qa_dict[question]
print(question)
print('\n'.join(choices))
print(ans)
您可以使用正则表达式来解析文件的内容。
此代码做出以下假设:
- 您正在使用 Python 3.6+ (f-strings) - 可以轻松修改代码以适应旧版本
- 分隔符由 75 个破折号组成
- 问题文本以一些空格为前缀,
'#'
字符和一些数字,后跟一个空格
- 答案由
'*'
字符分隔
- 正确答案的前缀是字符串
"Answer: "
- 至少一个换行符将正确答案和下一个分隔开
问题的分隔符。
即使您的文本文件中的单个问题定义不明确且不符合此格式,它也不会被正则表达式捕获或引发异常。
这段代码利用了命名捕获组,但这只是为了方便。我还选择使用 collections.namedtuple
来表示 Question
对象。我们遍历所有正则表达式匹配以生成 Question
个对象的列表:
def main():
from collections import namedtuple
import re as regex
Question = namedtuple("Question", ["text", "answers", "correct_answer"])
with open("questions.txt", "r") as file:
content = file.read()
sep = "-" * 75
pattern = f"{sep}\s+#\d+\s(?P<question>.+?(?={sep})){sep}\s+(?P<answers>.+?(?=Answer: ))Answer: (?P<correct_answer>[^\n]+)"
questions = []
for result in regex.finditer(pattern, content, flags=regex.DOTALL):
question_text = " ".join(result.group("question").split())
answers = list(filter(None, map(str.strip, result.group("answers").split("*"))))
correct_answer = result.group("correct_answer").strip()
questions.append(Question(question_text, answers, correct_answer))
print(questions)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
输出:
[Question(text='How tall is the actor Verne Troyer, famous for his role as Mini-me in the Austin Powers films?', answers=['36 inches', '32 inches', '24 inches', '35 inches'], correct_answer='32 inches'), Question(text='Who auditioned for the role of James Bond in 1969 but was turned down for being too tall?', answers=['John Cleese', 'Peter Snow', 'Simon Dee', 'Christopher Lee'], correct_answer='Peter Snow')]
有了问题对象后,设置测验的其余部分就很简单了。例如,选择一个随机问题就像 question = random.choice(questions)
一样简单。也就是说,您最好使用 questions = random.sample(questions, k=10)
随机选择问题(在本例中为 10 个)。使用 random.choice
将允许多次选择同一个问题,尽管在包含很多问题的文件中这不太可能。
我正在编写一个从 .txt 文件中读取问题的程序。问题被破折号“-----”包围,问题后面还有四个可能的正确答案,然后是答案。我不知道如何将这些分开打印。该文件超过 10,000 行,我需要能够一次打印一个问题,但我不确定如何指定仅在破折号之间读取。
文本示例:
---------------------------------------------------------------------------
#1010 How tall is the actor Verne Troyer, famous for his role as Mini-me
in the Austin Powers films?
---------------------------------------------------------------------------
*36 inches
*32 inches
*24 inches
*35 inches
Answer: 32 inches
---------------------------------------------------------------------------
#1011 Who auditioned for the role of James Bond in 1969 but was turned
down for being too tall?
---------------------------------------------------------------------------
*John Cleese
*Peter Snow
*Simon Dee
*Christopher Lee
Answer: Peter Snow
这会将其分成块:
with open('myfile.txt', 'r') as f:
data = f.read().split('-' * 5)
我需要更多细节或示例才能完成更多工作。
编辑(举例):
qa_dict = dict()
with open('myfile.txt', 'r') as f:
spacer = '-' * 75
last_q = ''
#grab each block, one at a time, where a block is a q or a
for block in f.read().split(spacer):
#if block is a question
if '#' in block and '?' in block:
if last_q != '':
print('something went wrong around:', block)
last_q = ' '.join(block.split())
#if block is an answer
elif last_q != '':
#remove messy whitespace, throw choices and answer into an array
choices_ans = list(filter(lambda let:len(let.strip()) > 0, block.split('\n')))
#build a {question:(choices, answer)} dictionary
qa_dict[last_q] = (choices_ans[:-1], choices_ans[-1])
last_q = ''
elif block.strip() != '':
print('received answer without question', block)
#print out everything
for question in qa_dict.keys():
choices, ans = qa_dict[question]
print(question)
print('\n'.join(choices))
print(ans)
您可以使用正则表达式来解析文件的内容。
此代码做出以下假设:
- 您正在使用 Python 3.6+ (f-strings) - 可以轻松修改代码以适应旧版本
- 分隔符由 75 个破折号组成
- 问题文本以一些空格为前缀,
'#'
字符和一些数字,后跟一个空格 - 答案由
'*'
字符分隔 - 正确答案的前缀是字符串
"Answer: "
- 至少一个换行符将正确答案和下一个分隔开 问题的分隔符。
即使您的文本文件中的单个问题定义不明确且不符合此格式,它也不会被正则表达式捕获或引发异常。
这段代码利用了命名捕获组,但这只是为了方便。我还选择使用 collections.namedtuple
来表示 Question
对象。我们遍历所有正则表达式匹配以生成 Question
个对象的列表:
def main():
from collections import namedtuple
import re as regex
Question = namedtuple("Question", ["text", "answers", "correct_answer"])
with open("questions.txt", "r") as file:
content = file.read()
sep = "-" * 75
pattern = f"{sep}\s+#\d+\s(?P<question>.+?(?={sep})){sep}\s+(?P<answers>.+?(?=Answer: ))Answer: (?P<correct_answer>[^\n]+)"
questions = []
for result in regex.finditer(pattern, content, flags=regex.DOTALL):
question_text = " ".join(result.group("question").split())
answers = list(filter(None, map(str.strip, result.group("answers").split("*"))))
correct_answer = result.group("correct_answer").strip()
questions.append(Question(question_text, answers, correct_answer))
print(questions)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
输出:
[Question(text='How tall is the actor Verne Troyer, famous for his role as Mini-me in the Austin Powers films?', answers=['36 inches', '32 inches', '24 inches', '35 inches'], correct_answer='32 inches'), Question(text='Who auditioned for the role of James Bond in 1969 but was turned down for being too tall?', answers=['John Cleese', 'Peter Snow', 'Simon Dee', 'Christopher Lee'], correct_answer='Peter Snow')]
有了问题对象后,设置测验的其余部分就很简单了。例如,选择一个随机问题就像 question = random.choice(questions)
一样简单。也就是说,您最好使用 questions = random.sample(questions, k=10)
随机选择问题(在本例中为 10 个)。使用 random.choice
将允许多次选择同一个问题,尽管在包含很多问题的文件中这不太可能。