在数组中查找多个出现的字符串并输出索引
Find multiple occuring string in array and output index
我有一个数组,其中包含不断变化的电子邮件地址。例如
mailAddressList = ['chip@plastroltech.com','spammer@example.test','webdude@plastroltech.com','spammer@example.test','spammer@example.test','support@plastroltech.com']
如何在数组中找到多次出现的同一字符串并输出它的索引?
试试这个:
query = 'spammer@example.test''
indexes = [i for i, x in enumerate(mailAddressList) if x == query]
输出:
[1, 3, 4]
只需通过电子邮件对索引进行分组并仅打印那些索引列表长度大于 1 的项目:
from collections import defaultdict
mailAddressList = ['chip@plastroltech.com',
'spammer@example.test',
'webdude@plastroltech.com',
'spammer@example.test',
'spammer@example.test',
'support@plastroltech.com'
]
index = defaultdict(list)
for i, email in enumerate(mailAddressList):
index[email].append(i)
print [(email, positions) for email, positions in index.items()
if len(positions) > 1]
# [('spammer@example.test', [1, 3, 4])]
In [7]: import collections
In [8]: q=collections.Counter(mailAddressList).most_common()
In [9]: indexes = [i for i, x in enumerate(mailAddressList) if x == q[0][0]]
In [10]: indexes
Out[10]: [1, 3, 4]
注意:之前提交的解决方案比我的更pythonic。但在我看来,我以前写的台词更容易理解。我只是创建一个字典,然后将邮件地址添加为键,将索引添加为值。
首先声明一个空字典。
>>> dct = {}
然后遍历 mailAddressList
中的邮件地址 (m
) 及其索引 (i
) 并将它们添加到字典中。
>>> for i, m in enumerate(mailAddressList):
... if m not in dct.keys():
... dct[m]=[i]
... else:
... dct[m].append(i)
...
现在,dct
看起来像这样。
>>> dct
{'support@plastroltech.com': [5], 'webdude@plastroltech.com': [2],
'chip@plastroltech.com': [0], 'spammer@example.test': [1, 3, 4]}
有很多方法可以抢[1,3,4]
。其中之一(也不是那么pythonic :))
>>> [i for i in dct.values() if len(i)>1][0]
[1, 3, 4]
或这个
>>> [i for i in dct.items() if len(i[1])>1][0] #you can add [1] to get [1,3,4]
('spammer@example.test', [1, 3, 4])
这是一个字典理解解决方案:
result = { i: [ k[0] for k in list(enumerate(mailAddressList)) if k[1] == i ] for j, i in list(enumerate(mailAddressList)) }
# Gives you: {'webdude@plastroltech.com': [2], 'support@plastroltech.com': [5], 'spammer@example.test': [1, 3, 4], 'chip@plastroltech.com': [0]}
当然,它没有排序,因为它是一个散列 table。如果你想订购它,你可以使用OrderedDict collection。比如像这样:
from collections import OrderedDict
final = OrderedDict(sorted(result.items(), key=lambda t: t[0]))
# Gives you: OrderedDict([('chip@plastroltech.com', [0]), ('spammer@example.test', [1, 3, 4]), ('support@plastroltech.com', [5]), ('webdude@plastroltech.com', [2])])
This discussion 不太相关,但它也可能对您有用。
mailAddressList = ["chip@plastroltech.com","spammer@example.test","webdude@plastroltech.com","spammer@example.test","spammer@example.test","support@plastroltech.com"]
print [index for index, address in enumerate(mailAddressList) if mailAddressList.count(address) > 1]
打印 [1, 3, 4]
,在列表中出现多次的地址索引。
我有一个数组,其中包含不断变化的电子邮件地址。例如
mailAddressList = ['chip@plastroltech.com','spammer@example.test','webdude@plastroltech.com','spammer@example.test','spammer@example.test','support@plastroltech.com']
如何在数组中找到多次出现的同一字符串并输出它的索引?
试试这个:
query = 'spammer@example.test''
indexes = [i for i, x in enumerate(mailAddressList) if x == query]
输出:
[1, 3, 4]
只需通过电子邮件对索引进行分组并仅打印那些索引列表长度大于 1 的项目:
from collections import defaultdict
mailAddressList = ['chip@plastroltech.com',
'spammer@example.test',
'webdude@plastroltech.com',
'spammer@example.test',
'spammer@example.test',
'support@plastroltech.com'
]
index = defaultdict(list)
for i, email in enumerate(mailAddressList):
index[email].append(i)
print [(email, positions) for email, positions in index.items()
if len(positions) > 1]
# [('spammer@example.test', [1, 3, 4])]
In [7]: import collections
In [8]: q=collections.Counter(mailAddressList).most_common()
In [9]: indexes = [i for i, x in enumerate(mailAddressList) if x == q[0][0]]
In [10]: indexes
Out[10]: [1, 3, 4]
注意:之前提交的解决方案比我的更pythonic。但在我看来,我以前写的台词更容易理解。我只是创建一个字典,然后将邮件地址添加为键,将索引添加为值。
首先声明一个空字典。
>>> dct = {}
然后遍历 mailAddressList
中的邮件地址 (m
) 及其索引 (i
) 并将它们添加到字典中。
>>> for i, m in enumerate(mailAddressList):
... if m not in dct.keys():
... dct[m]=[i]
... else:
... dct[m].append(i)
...
现在,dct
看起来像这样。
>>> dct
{'support@plastroltech.com': [5], 'webdude@plastroltech.com': [2],
'chip@plastroltech.com': [0], 'spammer@example.test': [1, 3, 4]}
有很多方法可以抢[1,3,4]
。其中之一(也不是那么pythonic :))
>>> [i for i in dct.values() if len(i)>1][0]
[1, 3, 4]
或这个
>>> [i for i in dct.items() if len(i[1])>1][0] #you can add [1] to get [1,3,4]
('spammer@example.test', [1, 3, 4])
这是一个字典理解解决方案:
result = { i: [ k[0] for k in list(enumerate(mailAddressList)) if k[1] == i ] for j, i in list(enumerate(mailAddressList)) }
# Gives you: {'webdude@plastroltech.com': [2], 'support@plastroltech.com': [5], 'spammer@example.test': [1, 3, 4], 'chip@plastroltech.com': [0]}
当然,它没有排序,因为它是一个散列 table。如果你想订购它,你可以使用OrderedDict collection。比如像这样:
from collections import OrderedDict
final = OrderedDict(sorted(result.items(), key=lambda t: t[0]))
# Gives you: OrderedDict([('chip@plastroltech.com', [0]), ('spammer@example.test', [1, 3, 4]), ('support@plastroltech.com', [5]), ('webdude@plastroltech.com', [2])])
This discussion 不太相关,但它也可能对您有用。
mailAddressList = ["chip@plastroltech.com","spammer@example.test","webdude@plastroltech.com","spammer@example.test","spammer@example.test","support@plastroltech.com"]
print [index for index, address in enumerate(mailAddressList) if mailAddressList.count(address) > 1]
打印 [1, 3, 4]
,在列表中出现多次的地址索引。