使用像 nltk 这样的 Python 库缩短句子

Shorten Sentence using Python Library like nltk

我正在使用 Nltk 从句子中删除停用词。

例如。 "I would love to fly again via American Airlines"

结果:"Love to fly American Airlines"

我试过以下代码:

# Tokenizing the text 
txt = "I love to fly with American Airlines"
stopWords = set(stopwords.words("english")) 
words = word_tokenize(txt) 

# Creating a frequency table to keep the  
# score of each word 

freqTable = dict() 
for word in words: 
    word = word.lower() 
    if word in stopWords: 
        continue
    if word in freqTable: 
        freqTable[word] += 1
    else: 
        freqTable[word] = 1

# Creating a dictionary to keep the score 
# of each sentence 
sentences = sent_tokenize(txt) 
sentenceValue = dict() 

for sentence in sentences: 
    for word, freq in freqTable.items(): 
        if word in sentence.lower(): 
            if sentence in sentenceValue: 
                sentenceValue[sentence] += freq 
            else: 
                sentenceValue[sentence] = freq 



sumValues = 0
for sentence in sentenceValue: 
    sumValues += sentenceValue[sentence] 

# Average value of a sentence from the original text 

average = int(sumValues / len(sentenceValue)) 

# Storing sentences into our summary. 
summary = '' 
for sentence in sentences: 
    if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)): 
        summary += " " + sentence 

print("Summary: " + summary)

这个结果是一个空字符串,因为我认为这个句子太短 Nltk 无法工作。只是研究是否有更简单的方法,我打算为此训练一个模型。

Python 可以通过删除停用词轻松高效地缩短句子的库是 nlkt,您也在使用它。但是您的方法(逻辑或代码)可能存在一些问题。

下面的代码完美运行

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 
 
example_sent = "I love to fly with American Airlines"
  
stop_words = set(stopwords.words('english')) 
  
word_tokens = word_tokenize(example_sent) 
  
filtered_sentence = [w for w in word_tokens if not w in stop_words] 
  
filtered_sentence = [] 
  
for w in word_tokens: 
    if w not in stop_words: 
        filtered_sentence.append(w) 
  
print(word_tokens)
print(filtered_sentence)
print(" ".join(filtered_sentence))