检查列表中是否出现多个 SUBstring，而不显式定义子字符串

Question

我有一个要分析的日志列表。我已将这些日志放入 Python 中的列表中。我想检查以确保以下子字符串 'pattern' 不会出现超过两次：

'Processing id xxxxxx'

其中 xxxxxx 是一个特定的 ID。基本上，我不希望日志处理 相同的 ID 超过两次......它可以处理几个不同的 ID，但是如果 相同的 ID 正在一遍又一遍地处理，我想知道。我不知道 ID 是什么，我只知道我不想重新处理相同的 ID。

我知道如何检查一个子字符串是否多次出现，但我不知道如何检查我当时不完全知道的 ID。

# response is the list of logs that I am analyzing.
# substring is the 'Processing id xxxxxx' string.

process_str = [s for s in response if substring in s]
if len(process_str) > 2:
   ## raise a flag here

Answer 1

遍历日志获取处理id并将它们存储在字典中，值为它出现的次数。

ids = {}
for s in response:
    m = re.search(r'(Processing id )(\d{6})', s)
    id = m.group(2)
    
    if id not in ids:
        ids[id] = 1
    else:
        ids[id] += 1

检查列表中是否出现多个 SUBstring，而不显式定义子字符串

Checking if there is more than one SUBstring occuring in a list, without explicitly defining the substring

python

string

search

list