组合正则表达式模式以匹配字符串的开头和结尾并删除分隔符

Question

我有以下字符串：

"LP, bar, company LLP, foo, LLP"
"LLP, bar, company LLP, foo, LP"
"LLP,bar, company LLP, foo,LP"  # note the absence of a space after/before comma to be removed

我正在寻找一个接受这些输入和 returns 以下内容的正则表达式：

"LP bar, company LLP, foo LLP"
"LLP bar, company LLP, foo LP"
"LLP bar, company LLP, foo LP"

我胖的是这个：

import re

def fix_broken_entity_names(name):
    """
    LLP, NAME -> LLP NAME
    NAME, LP -> NAME LP
    """
    pattern_end = r'^(LL?P),'
    pattern_beg_1 = r', (LL?P)$'
    pattern_beg_2 = r',(LL?P)$'
    combined = r'|'.join((pattern_beg_1, pattern_beg_2, pattern_end))
    return re.sub(combined, r' ', name)

当我运行时:

>>> fix_broken_entity_names("LP, bar, company LLP, foo,LP")
Out[1]: '  bar, company LLP, foo '

如果有任何提示或解决方案，我将不胜感激 :)

Answer 1

利用捕获组并按照您的意愿重新格式化：

正则表达式：

([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+)

替换

, ,

https://regex101.com/r/jcEzzy/1/

Answer 2

您可以使用

import re
texts = ["LP, bar, company LLP, foo, LLP","LLP, bar, company LLP, foo, LP","LLP,bar, company LLP, foo,LP"]
for text in texts:
    result = ' '.join(re.sub(r"^(LL?P)\s*,|,\s*(LL?P)$", r"  ", text).split())
    print("'{}' -> '{}'".format(text, result))

输出：

'LP, bar, company LLP, foo, LLP' -> 'LP bar, company LLP, foo LLP'
'LLP, bar, company LLP, foo, LP' -> 'LLP bar, company LLP, foo LP'
'LLP,bar, company LLP, foo,LP' -> 'LLP bar, company LLP, foo LP'

看到一个Python demo. The regex是^(LL?P)\s*,|,\s*(LL?P)$:

^(LL?P)\s*, - 字符串开头，LLP 或 LP（第 1 组），零个或多个空格，逗号
| - 或
,\s*(LL?P)$ - 逗号、零个或多个空格、LP 或 LLP（第 2 组）然后是字符串。

请注意，替换是包含在单个空格内的组 1 和组 2 值的串联，post-处理步骤是删除所有 leading/trailing 空格并将字符串中的空格缩小为单个空间。

组合正则表达式模式以匹配字符串的开头和结尾并删除分隔符

Combined regex pattern to match beginning and end of string and remove a separator character

python

regex

regex-group

python-re