通过推文列表循环 Python NLTK 分类器

Loop Python NLTK classifier through a list of tweets

我使用 twitter_sample 语料库训练了 NaiveBaynes 分类器。我能够在一条推文上测试分类器,以确保它能正常工作。但是,我现在正尝试通过 ~4000 条推文的列表循环分类器,并在我的代码中收到 AttributeError:

test_sample = []
for (words, sentiment) in test_tweets:
     words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
     sentiment = classifier.classify(extract_features(words.split()))
     test_sample.append(words_filtered, sentiment)

AttributeError: 'list' object has not attribute 'split'

test_tweets 是具有以下结构的推文列表:

('blah tweety blah', 'tbd')

我正在对推文进行情绪分析,分类器为每条推文生成 posneg 结果,产生如下输出这个:

('blah tweety blah', 'pos')

任何人都可以告诉我我的循环有什么问题吗?

该属性错误意味着您正在尝试拆分列表 - 因此 test_tweets 没有您认为的格式。必须有一个您期望字符串的列表。

作为故障排除步骤,您可以临时修改循环以查找列表而不是字符串的单词:

test_sample = []
for (words, sentiment) in test_tweets:
    if type(words) is list:
        print('This is a list, not a string ', end='') 
        print(words)
     # words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
     # sentiment = classifier.classify(extract_features(words.split()))
     # test_sample.append(words_filtered, sentiment)

然后,一旦您确定哪些词是列表,您就有几个选择。您可以使用相同的 if 语句来跳过该数据集或清理它。

test_sample = []
for (words, sentiment) in test_tweets:
    if type(words) is list:
        words_filtered = [t.lower() for t in words if len(t) >= 3] # just skip the split method
        sentiment = classifier.classify(extract_features(words))
        # continue  if you want to skip over lists, you can use continue to go to the next iteration of the loop
    else:
        words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
        sentiment = classifier.classify(extract_features(words.split()))
    test_sample.append(words_filtered, sentiment)