如何使用 NLTK 计算单个字符串的句子数

Question

我发现代码将文件路径和扩展名作为输入来计算使用 NLTK 的句子数量（如下），但没有关于如何应用存储在变量中的单个字符串的信息。这能做到吗？

import nltk
folder = nltk.data.find(dirpath)
corpusReader = nltk.corpus.PlaintextCorpusReader(folder, '.*\.txt')

print "The number of sentences =", len(corpusReader.sents())

Answer 1

试试sent_tokenize函数

from nltk.tokenize import sent_tokenize, word_tokenize

data = "All work and no play makes jack dull boy. All work and no play makes jack a dull boy."
print(sent_tokenize(data))

输出

['All work and no play makes jack dull boy.', 'All work and no play makes jack a dull boy.']

如何使用 NLTK 计算单个字符串的句子数

How to count number of sentence using NLTK for a single string

python

nlp

nltk