通过推文列表循环 Python NLTK 分类器
Loop Python NLTK classifier through a list of tweets
我使用 twitter_sample 语料库训练了 NaiveBaynes 分类器。我能够在一条推文上测试分类器,以确保它能正常工作。但是,我现在正尝试通过 ~4000 条推文的列表循环分类器,并在我的代码中收到 AttributeError:
test_sample = []
for (words, sentiment) in test_tweets:
words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
sentiment = classifier.classify(extract_features(words.split()))
test_sample.append(words_filtered, sentiment)
AttributeError: 'list' object has not attribute 'split'
test_tweets 是具有以下结构的推文列表:
('blah tweety blah', 'tbd')
我正在对推文进行情绪分析,分类器为每条推文生成 pos 或 neg 结果,产生如下输出这个:
('blah tweety blah', 'pos')
任何人都可以告诉我我的循环有什么问题吗?
该属性错误意味着您正在尝试拆分列表 - 因此 test_tweets 没有您认为的格式。必须有一个您期望字符串的列表。
作为故障排除步骤,您可以临时修改循环以查找列表而不是字符串的单词:
test_sample = []
for (words, sentiment) in test_tweets:
if type(words) is list:
print('This is a list, not a string ', end='')
print(words)
# words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
# sentiment = classifier.classify(extract_features(words.split()))
# test_sample.append(words_filtered, sentiment)
然后,一旦您确定哪些词是列表,您就有几个选择。您可以使用相同的 if 语句来跳过该数据集或清理它。
test_sample = []
for (words, sentiment) in test_tweets:
if type(words) is list:
words_filtered = [t.lower() for t in words if len(t) >= 3] # just skip the split method
sentiment = classifier.classify(extract_features(words))
# continue if you want to skip over lists, you can use continue to go to the next iteration of the loop
else:
words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
sentiment = classifier.classify(extract_features(words.split()))
test_sample.append(words_filtered, sentiment)
我使用 twitter_sample 语料库训练了 NaiveBaynes 分类器。我能够在一条推文上测试分类器,以确保它能正常工作。但是,我现在正尝试通过 ~4000 条推文的列表循环分类器,并在我的代码中收到 AttributeError:
test_sample = []
for (words, sentiment) in test_tweets:
words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
sentiment = classifier.classify(extract_features(words.split()))
test_sample.append(words_filtered, sentiment)
AttributeError: 'list' object has not attribute 'split'
test_tweets 是具有以下结构的推文列表:
('blah tweety blah', 'tbd')
我正在对推文进行情绪分析,分类器为每条推文生成 pos 或 neg 结果,产生如下输出这个:
('blah tweety blah', 'pos')
任何人都可以告诉我我的循环有什么问题吗?
该属性错误意味着您正在尝试拆分列表 - 因此 test_tweets 没有您认为的格式。必须有一个您期望字符串的列表。
作为故障排除步骤,您可以临时修改循环以查找列表而不是字符串的单词:
test_sample = []
for (words, sentiment) in test_tweets:
if type(words) is list:
print('This is a list, not a string ', end='')
print(words)
# words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
# sentiment = classifier.classify(extract_features(words.split()))
# test_sample.append(words_filtered, sentiment)
然后,一旦您确定哪些词是列表,您就有几个选择。您可以使用相同的 if 语句来跳过该数据集或清理它。
test_sample = []
for (words, sentiment) in test_tweets:
if type(words) is list:
words_filtered = [t.lower() for t in words if len(t) >= 3] # just skip the split method
sentiment = classifier.classify(extract_features(words))
# continue if you want to skip over lists, you can use continue to go to the next iteration of the loop
else:
words_filtered = [t.lower() for t in words.split() if len(t) >= 3]
sentiment = classifier.classify(extract_features(words.split()))
test_sample.append(words_filtered, sentiment)