是否可以在 Python 中使用 NLTK 从文本中删除句子?

Is it possible to drop sentences from the text with NLTK in Python?

例如,我有一个由几个句子组成的文本:

"First sentence is not relevant. Second contains information about KPI I want to keep. Third is useless. Fourth mentions topic relevant for me".

此外,我有自建词典,词目为{KPI,主题}。 是否有可能编写一种代码,只保留那些句子,字典中至少提到了一个单词?这样从上面的例子中,就只剩下第2句和第4句了。

谢谢

P.S。我已经有了将文本标记为句子的代码,但只留下 "relevant" 个并不常见,正如我所见。

一种解决方案是使用列表理解(参见下面的示例)。 但可能会有更好、更 pythonic 的解决方案。

sentences = ['Lorem ipsum dolor keyword sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.',
        'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.',
        'Duis aute irure other_keyword dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.',
        'Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.']

vocabulary = {'keyword': 'Topic 1',
             'other_keyword': 'Topic 2'}

[sentence for sentence in sentences if any(word in sentence for word in list(vocabulary.keys()))]


>>> ['Lorem ipsum dolor keyword sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.',
 'Duis aute irure other_keyword dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.']