正则表达式:将一些模式合并为一个
Regex: merge some pattern into one
我有一个包含以下内容的数据集:
(event) (tag) [group (artist)] title (form) [addition1] [addition2]
(event) [group (artist)] title (form) [addition1]
[event] [group (artist)] title (form) (addition1)
(tag) [group (artist)] title
[group (artist)] title
title
【tag】 [group (artist)] title 【form】
[group (artist)] title
[group] title
[artist] title
(artist) title
我想从每一行中获取标题。
匹配标题的模式有3种:
1.
([\)\]】]\s*(?P<title>[^\(\)\[\]\【\】\s]*)\s*[\(\[【])
可以匹配某些行,例如 *] title (*
2。
([\)\]】]\s*(?P<title>[^\(\)\[\]\【\】\s]*)
匹配像 *] title
这样的行
3。
(?P<title>[^\(\)\[\]\【\】\s]*)
匹配行只是 title
我不知道如何将三个规则组合成一个正则表达式。
所以,我写了一些 Python 代码来做到这一点:
- 匹配模式1,突破,获得称号
- 不匹配模式 1,尝试匹配模式 2
- 循环步骤 1,2
我正在尝试将这三个规则合并为一个。
类似
(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)
例子
>>> strings = ["(event) (tag) [group (artist)] title (form) [addition1] [addition2]", "(event) [group (artist)] title (form) [addition1]", "[event] [group (artist)] title (form) (addition1)", "(tag) [group (artist)] title", "[group (artist)] title", "title", "【tag 】 [group (artist)] title 【form】", "[group (artist)] title", "[group] title", "[artist] title", "(artist) title"]
>>> for string in strings:
... re.findall(r'(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)', string ) ...
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
我有一个包含以下内容的数据集:
(event) (tag) [group (artist)] title (form) [addition1] [addition2]
(event) [group (artist)] title (form) [addition1]
[event] [group (artist)] title (form) (addition1)
(tag) [group (artist)] title
[group (artist)] title
title
【tag】 [group (artist)] title 【form】
[group (artist)] title
[group] title
[artist] title
(artist) title
我想从每一行中获取标题。
匹配标题的模式有3种:
1.
([\)\]】]\s*(?P<title>[^\(\)\[\]\【\】\s]*)\s*[\(\[【])
可以匹配某些行,例如 *] title (*
2。
([\)\]】]\s*(?P<title>[^\(\)\[\]\【\】\s]*)
匹配像 *] title
3。
(?P<title>[^\(\)\[\]\【\】\s]*)
匹配行只是 title
我不知道如何将三个规则组合成一个正则表达式。 所以,我写了一些 Python 代码来做到这一点:
- 匹配模式1,突破,获得称号
- 不匹配模式 1,尝试匹配模式 2
- 循环步骤 1,2
我正在尝试将这三个规则合并为一个。
类似
(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)
例子
>>> strings = ["(event) (tag) [group (artist)] title (form) [addition1] [addition2]", "(event) [group (artist)] title (form) [addition1]", "[event] [group (artist)] title (form) (addition1)", "(tag) [group (artist)] title", "[group (artist)] title", "title", "【tag 】 [group (artist)] title 【form】", "[group (artist)] title", "[group] title", "[artist] title", "(artist) title"]
>>> for string in strings:
... re.findall(r'(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)', string ) ...
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']