组合正则表达式模式以匹配字符串的开头和结尾并删除分隔符
Combined regex pattern to match beginning and end of string and remove a separator character
我有以下字符串:
"LP, bar, company LLP, foo, LLP"
"LLP, bar, company LLP, foo, LP"
"LLP,bar, company LLP, foo,LP" # note the absence of a space after/before comma to be removed
我正在寻找一个接受这些输入和 returns 以下内容的正则表达式:
"LP bar, company LLP, foo LLP"
"LLP bar, company LLP, foo LP"
"LLP bar, company LLP, foo LP"
我胖的是这个:
import re
def fix_broken_entity_names(name):
"""
LLP, NAME -> LLP NAME
NAME, LP -> NAME LP
"""
pattern_end = r'^(LL?P),'
pattern_beg_1 = r', (LL?P)$'
pattern_beg_2 = r',(LL?P)$'
combined = r'|'.join((pattern_beg_1, pattern_beg_2, pattern_end))
return re.sub(combined, r' ', name)
当我 运行 时:
>>> fix_broken_entity_names("LP, bar, company LLP, foo,LP")
Out[1]: ' bar, company LLP, foo '
如果有任何提示或解决方案,我将不胜感激 :)
利用捕获组并按照您的意愿重新格式化:
正则表达式:
([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+)
替换
, ,
您可以使用
import re
texts = ["LP, bar, company LLP, foo, LLP","LLP, bar, company LLP, foo, LP","LLP,bar, company LLP, foo,LP"]
for text in texts:
result = ' '.join(re.sub(r"^(LL?P)\s*,|,\s*(LL?P)$", r" ", text).split())
print("'{}' -> '{}'".format(text, result))
输出:
'LP, bar, company LLP, foo, LLP' -> 'LP bar, company LLP, foo LLP'
'LLP, bar, company LLP, foo, LP' -> 'LLP bar, company LLP, foo LP'
'LLP,bar, company LLP, foo,LP' -> 'LLP bar, company LLP, foo LP'
看到一个Python demo. The regex是^(LL?P)\s*,|,\s*(LL?P)$
:
^(LL?P)\s*,
- 字符串开头,LLP
或 LP
(第 1 组),零个或多个空格,逗号
|
- 或
,\s*(LL?P)$
- 逗号、零个或多个空格、LP
或 LLP
(第 2 组)然后是字符串。
请注意,替换是包含在单个空格内的组 1 和组 2 值的串联,post-处理步骤是删除所有 leading/trailing 空格并将字符串中的空格缩小为单个空间。
我有以下字符串:
"LP, bar, company LLP, foo, LLP"
"LLP, bar, company LLP, foo, LP"
"LLP,bar, company LLP, foo,LP" # note the absence of a space after/before comma to be removed
我正在寻找一个接受这些输入和 returns 以下内容的正则表达式:
"LP bar, company LLP, foo LLP"
"LLP bar, company LLP, foo LP"
"LLP bar, company LLP, foo LP"
我胖的是这个:
import re
def fix_broken_entity_names(name):
"""
LLP, NAME -> LLP NAME
NAME, LP -> NAME LP
"""
pattern_end = r'^(LL?P),'
pattern_beg_1 = r', (LL?P)$'
pattern_beg_2 = r',(LL?P)$'
combined = r'|'.join((pattern_beg_1, pattern_beg_2, pattern_end))
return re.sub(combined, r' ', name)
当我 运行 时:
>>> fix_broken_entity_names("LP, bar, company LLP, foo,LP")
Out[1]: ' bar, company LLP, foo '
如果有任何提示或解决方案,我将不胜感激 :)
利用捕获组并按照您的意愿重新格式化:
正则表达式:
([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+)
替换
, ,
您可以使用
import re
texts = ["LP, bar, company LLP, foo, LLP","LLP, bar, company LLP, foo, LP","LLP,bar, company LLP, foo,LP"]
for text in texts:
result = ' '.join(re.sub(r"^(LL?P)\s*,|,\s*(LL?P)$", r" ", text).split())
print("'{}' -> '{}'".format(text, result))
输出:
'LP, bar, company LLP, foo, LLP' -> 'LP bar, company LLP, foo LLP'
'LLP, bar, company LLP, foo, LP' -> 'LLP bar, company LLP, foo LP'
'LLP,bar, company LLP, foo,LP' -> 'LLP bar, company LLP, foo LP'
看到一个Python demo. The regex是^(LL?P)\s*,|,\s*(LL?P)$
:
^(LL?P)\s*,
- 字符串开头,LLP
或LP
(第 1 组),零个或多个空格,逗号|
- 或,\s*(LL?P)$
- 逗号、零个或多个空格、LP
或LLP
(第 2 组)然后是字符串。
请注意,替换是包含在单个空格内的组 1 和组 2 值的串联,post-处理步骤是删除所有 leading/trailing 空格并将字符串中的空格缩小为单个空间。