nltk如何给出多个分隔的句子

Question

我有英文句子列表（每个句子都是一个列表），我想获取 ngram。例如：

sentences = [['this', 'is', 'sentence', 'one'], ['hello','again']]

为了运行

nltk.utils.ngram

我需要将列表扁平化为：

sentences = ['this','is','sentence','one','hello','again']

但后来我在

中得到一个错误的 bgram

('one','hello')

。最好的处理方法是什么？

谢谢！

Answer 1

试试这个：

from itertools import chain

sentences = list(chain(*sentences))

chain return 一个链对象，其 .__next__() 方法 returns 个元素从第一个 iterable 直到耗尽，然后是下一个 iterable 的元素可迭代，直到所有可迭代都用尽。

或者你可以这样做：

 sentences = [i for s in sentences for i in s]

Answer 2

你也可以使用列表理解

f = []
[f.extend(_l) for _l in sentences]

f = ['this', 'is', 'sentence', 'one', 'hello', 'again']

nltk how to give multiple separated sentences