Python 只重新匹配单词中的字母

Question

我是 Python 的新手，但我需要帮助。我在这里搜索 google，文档，但没有任何效果。所以这就是我想要做的。

我有词（例）"string" 然后我有单词表：

strings, string, str, ing, in, ins, rs, stress

我想匹配如下：string, str, ing, in, ins, rs.

不想匹配：stress，strings（因为有2x个s，而word string中只有1个）

只匹配word string.

抱歉英语不好，如果我解释得不够好。

是的，而且，有些字母是 unicode。

Answer 1

我不认为你可以用正则表达式做到这一点，但我认为你可以用 collections:

>>> from collections import Counter
>>> target = "string"
>>> words = ["strings", "string", "str", "ing", "in", "ins", "rs", "stress"]
>>> [word for word in words if not Counter(word) - Counter(target)]
['string', 'str', 'ing', 'in', 'ins', 'rs']

Answer 2

正则表达式可能不是最好的解决方案。这是一种算法：

为你的目标词制作一个字典，每个字母是一个键，值是该字母在该词中的数量。例如对于 string，s 的 key:value 对将是 {'s':1}。
对于您要测试的每个单词，请检查字典中是否包含每个字母，并且字母计数不超过目标单词中的计数。

Answer 3

我认为你完全不需要使用 Python re。如果我理解你的话，你只想得到那些字母不能重复的单词。

这个问题可以用下面两行Python代码解决。

str_list = [u'strings', u'string', u'str', u'ing', u'in', u'ins', u'rs', u'stress']
new_list = [i for i in str_list if len(set(i)) == len(i) ]
print new_list

程序的输出为：

[u'string', u'str', u'ing', u'in', u'ins', u'rs']

对于 unicode 字符串，您必须使用 unicode 字符串class或代码页。您不能使用 utf-8 表示。函数 set 从可迭代对象创建 unique 集。可迭代对象也是字符串。重复的字母被删除。如果删除某些内容，长度不能与原始字符串相同。

Answer 4

本着问题的精神，这是一个正则表达式的答案。

Here's the regex一起玩

是^(?=[string]{1,6}$)(?!.*(.).*).*$

这将检查 string 中出现 1-6 次的字符。后半部分确保没有重复。当然，如果原始 sstring 中有多个相同的字符，这种方法就会失效，而且它对于长字符串不是特别有效。

通用输入词运行的代码：

import re
mylist = ["strings", "string", "str", "ing", "in", "ins", "rs", "stress"]
word = "string"
r = re.compile("^(?=[%s]{1,%d}$)(?!.*(.).*).*$" % (word, len(word)))
print filter(r.match, mylist)

这会打印：

['string', 'str', 'ing', 'in', 'ins', 'rs']

您可以使用代码 here。

Python 只重新匹配单词中的字母

Python re match only letters from word

python

regex

python-2.x