跨新行获取两个字符之间的所有内容

Question

这是我正在使用的文本示例。

6) Jake's Taxi Service is a new entrant to the taxi industry. It has achieved success by staking out a unique position in the industry. How did Jake's Taxi Service mostly likely achieve this position?

A) providing long-distance cab fares at a higher rate than competitors; servicing a larger area than competitors

B) providing long-distance cab fares at a lower rate than competitors; servicing a smaller area than competitors

C) providing long-distance cab fares at a higher rate than competitors; servicing the same area as competitors

D) providing long-distance cab fares at a lower rate than competitors; servicing the same area as competitors

Answer: D

我正在尝试匹配整个问题，包括答案选项。从问题编号到答案这个词的所有内容

这是我当前的正则表达式

((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)

SearchCounter 只是一个与当前问题相对应的变量，在本例中为 6。我认为问题与跨新行搜索有关。

编辑：完整源代码

searchCounter = 1

bookDict = {}

with open ('StratMasterKey.txt', 'rt') as myfile:

    for line in myfile:
        question_pattern = re.compile((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL) 

        result = question_pattern.search(line)
        if result != None: 
            bookDict[searchCounter] = result[0] 
            searchCounter +=1

Answer 1

您的正则表达式失败的原因是您使用 for line in myfile: 逐行读取文件，而您的模式在 单个文件中搜索匹配项多行字符串。

将 for line in myfile: 替换为 contents = myfile.read()，然后使用 result = question_pattern.search(contents) 获取第一个匹配项，或者使用 result = question_pattern.findall(contents) 获取多个匹配项。

关于正则表达式的注释：我没有修复整个模式，因为正如你提到的，它超出了这个问题的范围，但由于字符串输入现在是一个多行字符串，你需要删除 re.DOTALL 并使用 [\s\S] 匹配模式中的任何字符，使用 . 匹配除换行符以外的任何字符。此外，环视结构是多余的，您可以安全地将 (?=Answer) 替换为 Answer。此外，要检查是否存在匹配项，您可以简单地使用 if result: 然后通过访问 result.group().

获取整个匹配值

完整代码片段：

with open ('StratMasterKey.txt', 'rt') as myfile:
    contents = myfile.read()
    question_pattern = re.compile((rf'(?<={searchCounter}\) )[\s\S]*?Answer.*')) 
    result = question_pattern.search(contents)
    if result: 
        print( result.group() )

跨新行获取两个字符之间的所有内容

Getting Everything Between Two Characters Across New Lines

regex

regex-group

regex-lookarounds