停在一行的正则表达式

Question

我正在尝试构建一个当一行等于“--- admonition”时停止的正则表达式。

例如，我有：

??? ad-question Quels sont les deux types de bornages ?

Il y en a deux :

- Le bornage amiable.

- Le bornage judiciaire.

test

--- admonition

我可以在一个页面上多次使用相同的捕获格式。

我想在第一组中检索（在每场比赛中）：

Quels sont les deux types de bornages ?

一秒钟后：

Il y en a deux :

Le bornage amiable.

Le bornage judiciaire.

test

我试过了：

^\?{3} ad-question {1}(.+)\n*((?:\n(?:^[^#].{0,2}$|^[^#].{3}(?<!---).*))+)

或

^\?{3} ad-question {1}(.+)\n*((?:\n(?:^[^\n#].{0,2}$|^[^\n#](?<!----).*))+)

但它并没有停在“\n--- admonition”，而是在两组之间换了一条线。

有人可以帮我构建这个正则表达式吗？

ps : 我必须在这两个组之间以及第 2 组和“----告诫”之间换行。所以这些行必须在组中避免ps.

感谢您的帮助。

Answer 1

试试这个正则表达式：

\?{3}\s*(.+)\s*((?:(?!-{3} admonition)[\s\S])*?)\s*-{3} admonition

Click for Demo

解释：

\?{3} - 匹配 3 次 ?
\s* - 匹配 0 个或多个空格
(.+) - 匹配任何字符出现 1 次或多次，换行除外，并将其捕获到组 1
\s* - 匹配 0 个或多个空格
((?:(?!-{3} admonition)[\s\S])*?)\s*-{3} admonition - 匹配不以 --- admonition 开头的任何字符的 0 次或多次出现。匹配所有这些字符后，它匹配 0 个或多个后跟单词 --- admonition

Answer 2

您很可能需要 re.DOTALL 和 re.MULTILINE 标志。您还可以将其用作模式中的 内联标志 ：'(?s)' and '(?m)'.

DOTALL 让 '.' 也捕获通常不匹配的 '\n'（re.DOTALL 是 python - 其他方言有类似的标志，f.e.: JS, Java).

您可以使用 r'\?\?\?(.*?)\?(.*?)--- admonition' 和那 2 个标志捕获您的。

Python例子（JS有DOTALL

import re

text = """??? ad-question Quels sont les deux types de bornages ?

Il y en a deux :

- Le bornage amiable.

- Le bornage judiciaire.

test

--- admonition
??? ad-question 2  types de bornages ?

Il y en a deux :

- Le bornage judiciaire.

test 2

--- admonition"""


pattern = r'\?\?\?(.*?)\?(.*?)--- admonition'

for f in re.finditer(pattern, text, re.MULTILINE | re.DOTALL):
    print(f)
    print(f.groups())  # tuple of groups (A, B, ..) of grouped matches

输出：

<re.Match object; span=(0, 144), match='??? ad-question Quels sont les deux types de born>
(' ad-question Quels sont les deux types de bornages ', 
 '\n\nIl y en a deux :\n\n- Le bornage amiable.\n\n- Le bornage judiciaire.\n\ntest\n\n')

<re.Match object; span=(145, 251), match='??? ad-question 2  types de bornages ?\n\nIl y en>
(' ad-question 2  types de bornages ', 
 '\n\nIl y en a deux :\n\n- Le bornage judiciaire.\n\ntest 2\n\n')

模式'\?\?\?(.*?)\?(.*?)--- admonition'解释：

\?\?\?                 - 3 literal question marks (QM)
(.*?)\?                - non greedy capture (including \n) up to 1st QM
(.*?)--- admonition    - non greedy capture up to ---admonition

Answer 3

如果你想要 2 个捕获组而不匹配组之间的换行符，但组之间必须至少有一个完整的空行：

^\?{3} ad-question (.+)\n{2,}((?:(?!---).*\n)*?)\n+---

模式匹配：

^ 字符串开头
\?{3} ad-question 匹配 ??? ad-question
(.+)捕获组1，匹配整行
\n{2,}匹配2个或更多的换行符，使得中间至少有一个空行
( 捕获组 2
- (?:(?!---).*\n)*? 尽可能重复匹配所有不以 ---
) 关闭组 2
\n+--- 匹配 1 个或多个换行符和 ---

Regex demo

如果应该至少有一个换行符：

^\?{3} ad-question (.+)\n+((?:(?!---).*\n)*?)\n*---

Regex demo

Answer 4

我想有很多方法可以做到这一点；我的两分钱：

^\?{3}\h+ad-question\h+(.+)\n+((?:.*\n?)+?)\n+^---\h+admonition$

在线查看demo

^\?{3}\h+ad-question\h+ - 起始行锚点后跟三个文字问号、1+（贪心）水平空白字符和文字 'ad-question' 以及另外 1+ 个空白字符；
(.+) - 你的第一个捕获组除了换行符外还有 1+ 个（贪心）字符；
\n+ - 1+（贪婪）换行符。
((?:.*\n?)+?) - 具有嵌套非捕获组的第二个捕获组匹配 1+（惰性）次，捕获 0+ 个字符直至可选的换行符；
\n+ - 1+（贪婪）换行符。
^---\h+admonition$ - 从起始行锚点到结束行锚点，匹配：'---'，多个空白字符和 'admonition'.

停在一行的正则表达式

Regex that stop at a line

regex

regex-group