跨新行获取两个字符之间的所有内容
Getting Everything Between Two Characters Across New Lines
这是我正在使用的文本示例。
6) Jake's Taxi Service is a new entrant to the taxi industry. It has achieved success by staking out a unique position in the industry. How did Jake's Taxi Service mostly likely achieve this position?
A) providing long-distance cab fares at a higher rate than
competitors; servicing a larger area than competitors
B) providing long-distance cab fares at a lower rate than competitors;
servicing a smaller area than competitors
C) providing long-distance cab fares at a higher rate than
competitors; servicing the same area as competitors
D) providing long-distance cab fares at a lower rate than competitors;
servicing the same area as competitors
Answer: D
我正在尝试匹配整个问题,包括答案选项。从问题编号到答案这个词的所有内容
这是我当前的正则表达式
((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
SearchCounter 只是一个与当前问题相对应的变量,在本例中为 6。我认为问题与跨新行搜索有关。
编辑:完整源代码
searchCounter = 1
bookDict = {}
with open ('StratMasterKey.txt', 'rt') as myfile:
for line in myfile:
question_pattern = re.compile((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
result = question_pattern.search(line)
if result != None:
bookDict[searchCounter] = result[0]
searchCounter +=1
您的正则表达式失败的原因是您使用 for line in myfile:
逐行 读取文件,而您的模式在 单个文件中搜索匹配项多行字符串。
将 for line in myfile:
替换为 contents = myfile.read()
,然后使用 result = question_pattern.search(contents)
获取第一个匹配项,或者使用 result = question_pattern.findall(contents)
获取多个匹配项。
关于正则表达式的注释:我没有修复整个模式,因为正如你提到的,它超出了这个问题的范围,但由于字符串输入现在是一个多行字符串,你需要删除 re.DOTALL
并使用 [\s\S]
匹配模式中的任何字符,使用 .
匹配除换行符以外的任何字符。此外,环视结构是多余的,您可以安全地将 (?=Answer)
替换为 Answer
。此外,要检查是否存在匹配项,您可以简单地使用 if result:
然后通过访问 result.group()
.
获取整个匹配值
完整代码片段:
with open ('StratMasterKey.txt', 'rt') as myfile:
contents = myfile.read()
question_pattern = re.compile((rf'(?<={searchCounter}\) )[\s\S]*?Answer.*'))
result = question_pattern.search(contents)
if result:
print( result.group() )
这是我正在使用的文本示例。
6) Jake's Taxi Service is a new entrant to the taxi industry. It has achieved success by staking out a unique position in the industry. How did Jake's Taxi Service mostly likely achieve this position?
A) providing long-distance cab fares at a higher rate than competitors; servicing a larger area than competitors
B) providing long-distance cab fares at a lower rate than competitors; servicing a smaller area than competitors
C) providing long-distance cab fares at a higher rate than competitors; servicing the same area as competitors
D) providing long-distance cab fares at a lower rate than competitors; servicing the same area as competitors
Answer: D
我正在尝试匹配整个问题,包括答案选项。从问题编号到答案这个词的所有内容
这是我当前的正则表达式
((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
SearchCounter 只是一个与当前问题相对应的变量,在本例中为 6。我认为问题与跨新行搜索有关。
编辑:完整源代码
searchCounter = 1
bookDict = {}
with open ('StratMasterKey.txt', 'rt') as myfile:
for line in myfile:
question_pattern = re.compile((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
result = question_pattern.search(line)
if result != None:
bookDict[searchCounter] = result[0]
searchCounter +=1
您的正则表达式失败的原因是您使用 for line in myfile:
逐行 读取文件,而您的模式在 单个文件中搜索匹配项多行字符串。
将 for line in myfile:
替换为 contents = myfile.read()
,然后使用 result = question_pattern.search(contents)
获取第一个匹配项,或者使用 result = question_pattern.findall(contents)
获取多个匹配项。
关于正则表达式的注释:我没有修复整个模式,因为正如你提到的,它超出了这个问题的范围,但由于字符串输入现在是一个多行字符串,你需要删除 re.DOTALL
并使用 [\s\S]
匹配模式中的任何字符,使用 .
匹配除换行符以外的任何字符。此外,环视结构是多余的,您可以安全地将 (?=Answer)
替换为 Answer
。此外,要检查是否存在匹配项,您可以简单地使用 if result:
然后通过访问 result.group()
.
完整代码片段:
with open ('StratMasterKey.txt', 'rt') as myfile:
contents = myfile.read()
question_pattern = re.compile((rf'(?<={searchCounter}\) )[\s\S]*?Answer.*'))
result = question_pattern.search(contents)
if result:
print( result.group() )