正则表达式。获取两个括号之间的字符串 (python)

Regex. Get string beetwen two brakets (python)

我是正则表达式的新手,在理解方面有些问题。

以下是一些输入字符串:

Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)

我想从每个字符串中得到:

Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%

我使用正则表达式:\((.*?)\)$,但以防万一

Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)

它returns

UK) - Carling Original (Lager - Pale. ABV 3,7%

我无法想象我应该在我的正则表达式中添加什么,只得到

Lager - Pale. ABV 3,7%

要仅支持最多一个嵌套级别,您可以使用 \(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$ 正则表达式,请参阅 regex demo

import re
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = re.compile(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$')
for line in text.splitlines(True):
    m = rx.search(line)
    if m:
        print( m.group(1) )

Python demo详情:

  • \( - 一个 ( 字符
  • ([^()]*(?:\([^()]*\)[^()]*)*) - 第 1 组:() 以外的零个或多个字符,然后是 ( 的零个或多个序列,除此之外的零个或多个字符() 然后是 ) 字符,然后是 ()
  • 以外的零个或多个字符
  • \) - 一个 ) 字符 \s*$ - 零个或多个空格和字符串结尾。

要支持任意数量的嵌套级别,您不能使用 re,因为它不支持递归。您可以 pip install regex 并使用

import regex
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = regex.compile(r'(\(((?:[^()]++|(?1))*)\))\s*$')
for line in text.splitlines(True):
    m = rx.search(line)
    if m:
        print( m.group(2) )

Python demo详情:

  • (\(((?:[^()]++|(?1))*)\)) - 第 1 组:(,然后第 2 组捕获除 () 或第 1 组之外的一个或多个字符的任意零个或多个序列模式,然后是 ) 字符
  • \s*$ - 零个或多个空格和字符串结尾。

输出:

Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%