Python/Regex：如果一行确实包含某个特殊字符，则拆分字符串

Question

我正在尝试在字符上拆分多行字符串，但前提是该行不包含 :。不幸的是，我看不到一种简单的方法来使用 re.split() 对字符 : 进行负面回顾，因为 : 可能出现在字符串前面的另一行中。

例如，我想在 ) 上拆分以下字符串。

字符串:

Hello1 (
First : (),
Second )

Hello2 (
First 
)

输出：

['Hello1 (\nFirst : (),\nSecond', 'Hello2 (\nFirst \n']

Answer 1

使用 Python 是可能的，尽管不是 "out of the box" 使用原生 re 模块。

第一个选择

较新的 regex module 支持 variable-length 回顾，因此您可以使用

(?<=^[^:]+)\)
# pos. lookbehind making sure there's no : in that line

在 Python:

import regex as re

data = """
Hello1 (
First : (),
Second )

Hello2 (
First 
)"""

pattern = re.compile(r'(?<=^[^:]+)\)', re.MULTILINE)

parts = pattern.split(data)
print(parts)

产生

['\nHello1 (\nFirst : (),\nSecond ', '\n\nHello2 (\nFirst \n', '']

第二种选择

或者，您可以匹配有问题的行，然后让它们以 (*SKIP)(*FAIL) 失败：

^[^:\n]*:.*(*SKIP)(*FAIL)|\)
# match lines with at least one : in it
# let them fail
# or match )

再次在 Python:

pattern2 = re.compile(r'^[^:\n]*:.*(*SKIP)(*FAIL)|\)', re.MULTILINE)
parts2 = pattern.split(data)
print(parts2)

参见 a demo for the latter on regex101.com。

第三个选择

好的，现在答案比以前想象的要长。您甚至可以在函数的帮助下使用本机 re 模块来完成它。在这里，您需要先替换有问题的 ) 并拆分为替换：

def replacer(match):
    if match.group(1) is not None:
        return "SUPERMAN"
    else:
        return match.group(0)

pattern3 = re.compile(r'^[^:\n]*:.*|(\))', re.MULTILINE)
data = pattern3.sub(replacer, data)
parts3 = data.split("SUPERMAN")
print(parts3)

Python/Regex：如果一行确实包含某个特殊字符，则拆分字符串

Python/Regex: Split string if a line does contain a certain special character

python

regex

regex-negation

regex-lookarounds

第一个选择

第二种选择

第三个选择