匹配最小的句子

Question

正文：

One sentence here, much wow. Another one here. This is O.N.E. example n. 1, a nice one to understand. Hope it's clear now!

正则表达式：(?<=\.\s)[A-Z].+?nice one.+?\.(?=\s[A-Z])

结果：Another one here. This is O.N.E. example n. 1, a nice one to understand.

如何获得This is O.N.E. example among n. 1, a nice one to understand.？（即匹配正则表达式的最小可能句子）

Answer 1

只要在表达式

前面插入一个贪心的.*

.*\.\s([A-Z].+?nice one.+?\.(?=\s[A-Z]))

Answer 2

您可以排除匹配点，只匹配大写字符后跟点或点后跟 space 和数字的点。

(?:(?<=\.\s)|^)[A-Z][^.A-Z]*(?:(?:[A-Z]\.|\.\s\d)[^.A-Z]*)*\bnice one\b.+?(?=\s[A-Z])

(?:(?<=\.\s)|^) 断言 . 和白色 space 字符到左侧或字符串的开头
[A-Z][^.A-Z]* 匹配大写字符 A-Z 和 0+ 次除点或大写字符以外的任何字符
(?:非捕获组
- (?:[A-Z]\.|\.\s\d) 匹配 A-Z 和 . 或匹配 . whitespace char 和 digit
- [^.A-Z]* 可选择匹配除 . 或大写字符
)* 关闭群组并可选择重复
\bnice one\b.+?(?=\s[A-Z]) 匹配 nice one 并匹配直到断言一个 whitspace 字符和右边的大写字符

Regex demo

Answer 3

这里有一些不同的方法，只是拆分整个文本，然后过滤掉您想要的内容：

import re
s = "One sentence here, much wow. Another one here. This is O.N.E. example n. 1, a nice one to understand. Hope it's clear now!"
result = [x for x in re.split(r'(?<=\B.\.)\s*',s) if 'nice one' in x][0]
print(result) # This is O.N.E. example n. 1, a nice one to understand.

不确定你有多少边缘情况，但在这里我使用了 re.split() 和以下模式：(?<=\B.\.)\s*。这意味着：

(?<=\B.\.) - 断言位置的正后视是在 \b（单词边界）不适用的位置之后，后跟文字点.
\s* - 0+ 个空白字符。

使用生成的数组，检查哪个元素包含您想要的单词“nice one”不会有太大问题。

在线查看demo

匹配最小的句子

Match smallest possible sentence

python

regex

findall

python-re