跨新行获取两个字符之间的所有内容

Getting Everything Between Two Characters Across New Lines

这是我正在使用的文本示例。

6) Jake's Taxi Service is a new entrant to the taxi industry. It has achieved success by staking out a unique position in the industry. How did Jake's Taxi Service mostly likely achieve this position?

A) providing long-distance cab fares at a higher rate than competitors; servicing a larger area than competitors

B) providing long-distance cab fares at a lower rate than competitors; servicing a smaller area than competitors

C) providing long-distance cab fares at a higher rate than competitors; servicing the same area as competitors

D) providing long-distance cab fares at a lower rate than competitors; servicing the same area as competitors

Answer: D

我正在尝试匹配整个问题,包括答案选项。从问题编号到答案这个词的所有内容

这是我当前的正则表达式

((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)

SearchCounter 只是一个与当前问题相对应的变量,在本例中为 6。我认为问题与跨新行搜索有关。

编辑:完整源代码

searchCounter = 1

bookDict = {}

with open ('StratMasterKey.txt', 'rt') as myfile:

    for line in myfile:
        question_pattern = re.compile((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL) 

        result = question_pattern.search(line)
        if result != None: 
            bookDict[searchCounter] = result[0] 
            searchCounter +=1

您的正则表达式失败的原因是您使用 for line in myfile: 逐行 读取文件,而您的模式在 单个文件中搜索匹配项多行字符串

for line in myfile: 替换为 contents = myfile.read(),然后使用 result = question_pattern.search(contents) 获取第一个匹配项,或者使用 result = question_pattern.findall(contents) 获取多个匹配项。

关于正则表达式的注释:我没有修复整个模式,因为正如你提到的,它超出了这个问题的范围,但由于字符串输入现在是一个多行字符串,你需要删除 re.DOTALL 并使用 [\s\S] 匹配模式中的任何字符,使用 . 匹配除换行符以外的任何字符。此外,环视结构是多余的,您可以安全地将 (?=Answer) 替换为 Answer。此外,要检查是否存在匹配项,您可以简单地使用 if result: 然后通过访问 result.group().

获取整个匹配值

完整代码片段:

with open ('StratMasterKey.txt', 'rt') as myfile:
    contents = myfile.read()
    question_pattern = re.compile((rf'(?<={searchCounter}\) )[\s\S]*?Answer.*')) 
    result = question_pattern.search(contents)
    if result: 
        print( result.group() )