正则表达式。获取两个括号之间的字符串 (python)

Question

我是正则表达式的新手，在理解方面有些问题。

以下是一些输入字符串：

Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)

我想从每个字符串中得到：

Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%

我使用正则表达式：$(.*?)$$，但以防万一

Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)

它returns

UK) - Carling Original (Lager - Pale. ABV 3,7%

我无法想象我应该在我的正则表达式中添加什么，只得到

Lager - Pale. ABV 3,7%

Answer 1

要仅支持最多一个嵌套级别，您可以使用 $([^()]*(?:\([^()]*$[^()]*)*)\)\s*$ 正则表达式，请参阅 regex demo。

import re
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = re.compile(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$')
for line in text.splitlines(True):
    m = rx.search(line)
    if m:
        print( m.group(1) )

见Python demo。详情:

\( - 一个 ( 字符
([^()]*(?:$[^()]*$[^()]*)*) - 第 1 组：( 和 ) 以外的零个或多个字符，然后是 ( 的零个或多个序列，除此之外的零个或多个字符( 和 ) 然后是 ) 字符，然后是 ( 和 )
\) - 一个 ) 字符 \s*$ - 零个或多个空格和字符串结尾。

要支持任意数量的嵌套级别，您不能使用 re，因为它不支持递归。您可以 pip install regex 并使用

import regex
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = regex.compile(r'(\(((?:[^()]++|(?1))*)\))\s*$')
for line in text.splitlines(True):
    m = rx.search(line)
    if m:
        print( m.group(2) )

见Python demo。详情:

($((?:[^()]++|(?1))*)$) - 第 1 组：(，然后第 2 组捕获除 ( 和 ) 或第 1 组之外的一个或多个字符的任意零个或多个序列模式，然后是 ) 字符
\s*$ - 零个或多个空格和字符串结尾。

输出：

Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%

正则表达式。获取两个括号之间的字符串 (python)

Regex. Get string beetwen two brakets (python)

python

regex