正则表达式。获取两个括号之间的字符串 (python)
Regex. Get string beetwen two brakets (python)
我是正则表达式的新手,在理解方面有些问题。
以下是一些输入字符串:
Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)
我想从每个字符串中得到:
Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%
我使用正则表达式:\((.*?)\)$
,但以防万一
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
它returns
UK) - Carling Original (Lager - Pale. ABV 3,7%
我无法想象我应该在我的正则表达式中添加什么,只得到
Lager - Pale. ABV 3,7%
要仅支持最多一个嵌套级别,您可以使用 \(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$
正则表达式,请参阅 regex demo。
import re
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = re.compile(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$')
for line in text.splitlines(True):
m = rx.search(line)
if m:
print( m.group(1) )
见Python demo。 详情:
\(
- 一个 (
字符
([^()]*(?:\([^()]*\)[^()]*)*)
- 第 1 组:(
和 )
以外的零个或多个字符,然后是 (
的零个或多个序列,除此之外的零个或多个字符(
和 )
然后是 )
字符,然后是 (
和 )
以外的零个或多个字符
\)
- 一个 )
字符
\s*$
- 零个或多个空格和字符串结尾。
要支持任意数量的嵌套级别,您不能使用 re
,因为它不支持递归。您可以 pip install regex
并使用
import regex
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = regex.compile(r'(\(((?:[^()]++|(?1))*)\))\s*$')
for line in text.splitlines(True):
m = rx.search(line)
if m:
print( m.group(2) )
见Python demo。 详情:
(\(((?:[^()]++|(?1))*)\))
- 第 1 组:(
,然后第 2 组捕获除 (
和 )
或第 1 组之外的一个或多个字符的任意零个或多个序列模式,然后是 )
字符
\s*$
- 零个或多个空格和字符串结尾。
输出:
Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%
我是正则表达式的新手,在理解方面有些问题。
以下是一些输入字符串:
Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)
我想从每个字符串中得到:
Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%
我使用正则表达式:\((.*?)\)$
,但以防万一
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
它returns
UK) - Carling Original (Lager - Pale. ABV 3,7%
我无法想象我应该在我的正则表达式中添加什么,只得到
Lager - Pale. ABV 3,7%
要仅支持最多一个嵌套级别,您可以使用 \(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$
正则表达式,请参阅 regex demo。
import re
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = re.compile(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$')
for line in text.splitlines(True):
m = rx.search(line)
if m:
print( m.group(1) )
见Python demo。 详情:
\(
- 一个(
字符([^()]*(?:\([^()]*\)[^()]*)*)
- 第 1 组:(
和)
以外的零个或多个字符,然后是(
的零个或多个序列,除此之外的零个或多个字符(
和)
然后是)
字符,然后是(
和)
以外的零个或多个字符
\)
- 一个)
字符\s*$
- 零个或多个空格和字符串结尾。
要支持任意数量的嵌套级别,您不能使用 re
,因为它不支持递归。您可以 pip install regex
并使用
import regex
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = regex.compile(r'(\(((?:[^()]++|(?1))*)\))\s*$')
for line in text.splitlines(True):
m = rx.search(line)
if m:
print( m.group(2) )
见Python demo。 详情:
(\(((?:[^()]++|(?1))*)\))
- 第 1 组:(
,然后第 2 组捕获除(
和)
或第 1 组之外的一个或多个字符的任意零个或多个序列模式,然后是)
字符\s*$
- 零个或多个空格和字符串结尾。
输出:
Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%