在保留外观顺序的字符串中查找多个子字符串

find multiple substrings in string preserving appearance order

我有一个包含以下内容的数组 ids:values。

例如:

[3] 'hello'
[24] 'tell me a joke'
[34] 'im bored'
[42] 'what time is it'
[56] 'how are you'
[69] 'what are you doing'

我还有一些用户输入可以包含数组的多个值,例如:

'hello and good evening. how are you. im bored and need some entertainment. please tell me a joke.'

我能够找到所有匹配项,但顺序不正确。

预期结果的顺序与它们在输入字符串中出现的顺序相同

例如

"hello", "how are you", "im bored", "tell me a joke"

遍历字典并使用 in 运算符查看每个短语是否在输入中。如果是,请使用 index 方法找出位置,以便您可以按该索引对结果进行排序。然后您可以从最终结果中删除索引,这样您就只有 id 和值的元组。

>>> user_input = 'hello and good evening. how are you. im bored and need some entertainment. please tell me a joke.'
>>> data = {
...     3: 'hello',
...     24: 'tell me a joke',
...     34: 'im bored',
...     42: 'what time is it',
...     56: 'how are you',
...     69: 'what are you doing',
... }
>>>
>>> [(k, v) for _, k, v in sorted(
...     (user_input.index(v), k, v)
...     for k, v in data.items()
...     if v in user_input
... )]
[(3, 'hello'), (56, 'how are you'), (34, 'im bored'), (24, 'tell me a joke')]

或者,由于您已经拥有告诉您每个 ID 的字符串的数据库,您可以像这样计算 ID 列表:

>>> [k for _, k in sorted(
...     (user_input.index(v), k)
...     for k, v in data.items()
...     if v in user_input
... )]
[3, 56, 34, 24]

然后您当然可以获取该列表并执行以下操作:

>>> " ".join(data[i] for i in [3, 56, 34, 24])
'hello how are you im bored tell me a joke'

等等