使用 nltk 从句子和反向句子中获取名词时，我缺少什么？

Question

我有一个 is_noun 定义使用 nltk:

is_noun = lambda pos: pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'

然后我在一个函数中有这个：

def test(text):
    tokenized = nltk.word_tokenize(text)
    nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if is_noun(pos)]  
    print ('Nouns:', nouns)
    return nouns

然后我调用函数：

test('When will this long and tedious journey ever end? Like all good')

并得到：

Nouns: ['journey']

然后调用相同的函数，但语句相反，得到：

test('good all Like end? ever journey tedious and long this will When')

结果：

  Nouns: ['end']

我希望获得相同数量的名词，但事实并非如此。我做错了什么？

Answer 1

总结：GIGO（垃圾输入 => 垃圾输出）。

正如评论所暗示的，词序很重要。英语中充斥着可以充当多个词性的单词，具体取决于短语中的位置。考虑：

You can cage a swallow.
You cannot swallow a cage.

在您提供的第二个文本中，您无论如何都没有合法的判决。英语解析器可以确定的最好结果是 "end" 可能是动词 "like" 的直接宾语，因此在这种情况下是名词。同样，"journey" 似乎是第二个单词序列的主要动词。

使用 nltk 从句子和反向句子中获取名词时，我缺少什么？

What am I missing when getting nouns from sentence and reversed sentence using nltk?

python

nltk

总结：GIGO（垃圾输入 => 垃圾输出）。