从停用词中清除列表
Clean list from stopwords
这个变量:
sent=[('include', 'details', 'about', 'your performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
需要清除停用词。
我试过
output = [w for w in sent if not w in stop_words]
但是没有用。
怎么了?
圆括号阻碍了迭代。如果你可以删除它们:
sent=['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']
output = [w for w in sent if not w in stopwords]
如果没有,那么你可以这样做:
sent=[('include', 'details', 'about', 'your performance'),('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
output = [i for s in [[w for w in l if w not in stopwords] for l in sent] for i in s]
from nltk.corpus import stopwords
stop_words = {w.lower() for w in stopwords.words('english')}
sent = [('include', 'details', 'about', 'your', 'performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
如果您想创建一个没有停用词的单词列表;
>>> no_stop_words = [word for sentence in sent for word in sentence if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']
如果你想保持句子完整;
>>> sent_no_stop = [[word for word in sentence if word not in stop_words] for sentence in sent]
[['include', 'details', 'performance'], ['show', 'results,', 'got']]
但是,大多数时候您会使用单词列表(不带括号);
sent = ['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']
>>> no_stopwords = [word for word in sent if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']
您的实际代码中是否缺少引号?如果您使用相同类型的引号,请确保关闭所有字符串并使用反斜杠转义撇号。我也会把每个词分开,像这样:
sent=[('include', 'details', 'about', 'your', 'performance'), ('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
这个变量:
sent=[('include', 'details', 'about', 'your performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
需要清除停用词。 我试过
output = [w for w in sent if not w in stop_words]
但是没有用。 怎么了?
圆括号阻碍了迭代。如果你可以删除它们:
sent=['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']
output = [w for w in sent if not w in stopwords]
如果没有,那么你可以这样做:
sent=[('include', 'details', 'about', 'your performance'),('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
output = [i for s in [[w for w in l if w not in stopwords] for l in sent] for i in s]
from nltk.corpus import stopwords
stop_words = {w.lower() for w in stopwords.words('english')}
sent = [('include', 'details', 'about', 'your', 'performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]
如果您想创建一个没有停用词的单词列表;
>>> no_stop_words = [word for sentence in sent for word in sentence if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']
如果你想保持句子完整;
>>> sent_no_stop = [[word for word in sentence if word not in stop_words] for sentence in sent]
[['include', 'details', 'performance'], ['show', 'results,', 'got']]
但是,大多数时候您会使用单词列表(不带括号);
sent = ['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']
>>> no_stopwords = [word for word in sent if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']
您的实际代码中是否缺少引号?如果您使用相同类型的引号,请确保关闭所有字符串并使用反斜杠转义撇号。我也会把每个词分开,像这样:
sent=[('include', 'details', 'about', 'your', 'performance'), ('show', 'the', 'results,', 'which', 'you\'ve', 'got')]