在正则表达式中捕获多个组不会 return 任何结果

Question

我有一个python函数

def regex(series, regex):
    series = series.str.extract(regex)
    series1 = series.dropna()
    return (series1)

旨在将正则表达式与以下模式匹配：

任何带有 'no' 后跟（词组）或 'not' 的内容都不应匹配。下面是 python 函数中使用的正则表达式：

result = regex(df['col'],r'(^(?!.*\bno\b.*\b(text|sample text )\b)(?!.*\b(text|sample text)\b .*not).+$)')

在函数中应用正则表达式时，我没有得到任何结果（只是一个空数据框），

但在此 link 中测试正则表达式效果很好 https://regex101.com/r/Epq0Ns/21

Answer 1

尝试使用您在 regex101 上使用的相同标志 - 将函数中的行更改为：

series = series.str.extract(regex, re.M | re.S)

或

series = series.str.extract(regex, flags=re.M|re.S)

如果你有输入定义的代码，我会测试的。

Answer 2

代码

为了简单起见，您实际上可以只使用列表和列表理解来构建简单的正则表达式模式。

用法

See code in use here

import re

negations = ["no", "not"]
words = ["text", "sample text", "text book", "notebook"]
sentences = [
    "first sentence with no and sample text",
    "second with a text but also a not",
    "third has a no, a text and a not",
    "fourth alone is what is neeeded with just text",
    "keep putting line here no"
] 

for sentence in sentences:
    negationsRegex = re.compile(r"\b(?:" + "|".join([re.escape(n) for n in negations]) + r")\b")
    wordsRegex = re.compile(r"\b(?:" + "|".join([re.escape(w) for w in words]) + r")\b")
    if not (re.search(negationsRegex, sentence) and re.search(wordsRegex, sentence)):
        print sentence

以上代码输出:

fourth alone is what is neeeded with just text
keep putting line here no

说明

该代码编译了一个由正则表达式转义的单词组成的连接列表，确保设置了单词边界。生成的结果正则表达式（给定列表 negations 和 `words）将如下所示：

\b(?:no|not)\b
\b(?:text|sample text|text book|notebook)\b

if 语句然后检查两个生成的模式（否定正则表达式和单词正则表达式）是否与句子匹配。如果两个表达式都不匹配（一个或两个都不匹配），则返回字符串。

在正则表达式中捕获多个组不会 return 任何结果

Capturing multiple groups in a regex does not return any result

python

regex

text-mining

pandas

代码

用法

说明