正则表达式：查找所有包含特定字母但不包含其他字母的单词

Question

谁能帮我一下：

我需要从列表中找到所有包含字母 [t OR d] AND [k OR c] 但不包含任何 [s,z,n,m] 的单词

我想通了第一部分，但不知道如何包含停用列表：

\w*[t|d]\w*[k|c]\w*

用Python表示法

提前致谢

Answer 1

您可以使用 2 个步骤。首先找到 t|d 和 k|c，然后过滤掉不需要的字母。

既然你说你弄明白了第一部分，那么这是第二部分：

matches = [i for i in matches if not re.search(r'[sznm]', i)]    
print(matches)

Answer 2

如果您需要 t or d 出现在 k or c 之前，请使用：[^sznm\s\d]*[td][^sznm\s\d]*[kc][^sznm\s\d]*.

[^sznm\s\d] 表示除 z, n, m, s、空白字符 (\s) 或数字 (\d) 之外的任何字符。

Answer 3

s = "foobar foo".split()

allowed = ({"k", "c"}, {"r", "d"})
forbid = {"s","c","z","m"}

for word in s:
    if all(any(k in st for k in word) for st in allowed) and all(k not in forbid for k in word):
        print(word)

或使用列表组合 set.intersection:

words = [word for word in s if all(st.intersection(word) for st in allowed) and not denied.intersection(word)]

Answer 4

使用此代码：

import re
re.findall('[abcdefghijklopqrtuvwxy]*[td][abcdefghijklopqrtuvwxy]*[kc][abcdefghijklopqrtuvwxy]*', text)

Answer 5

基于 Padraic

的回答

编辑我们都错过了这个条件

[t OR d] AND [k OR c]

所以 - 相应地修正了

s = "detected dot knight track"

allowed = ({"t","d"},{"k","c"})
forbidden = {"s","z","n", "m"}

for word in s.split():
    letter_set = set(word)
    if all(letter_set & a for a in allowed) and letter_set - forbidden == letter_set:
        print(word)

结果是

detected
track

Answer 6

我真的很喜欢@padraic-cunningham 的回答，它没有使用 re，但这里有一个模式，它会起作用：

pattern = r'(?!\w*[sznm])(?=\w*[td])(?=\w*[kc])\w*'

正面 (?=...) 和负面 (?!...) 前瞻断言在 python.org.

上有详细记录

Answer 7

您需要使用环顾四周。

^(?=.*[td])(?!.*[sznm])\w*[kc]\w*$

即

>>> l = ['fooktz', 'foocdm', 'foobar', 'kbard']
>>> [i for i in l if re.match(r'^(?=.*[td])(?!.*[sznm])\w*[kc]\w*$', i)]
['kbard']

正则表达式：查找所有包含特定字母但不包含其他字母的单词

regex: find all words with certain letters but not other

python

regex

letters