如何使用 nlp 标记句子
How to tokenize sentence using nlp
我是 NLP 新手。我正在尝试在 python 3 上使用 nlp 标记句子。7.So 我使用了以下代码
import nltk
text4="This is the first sentence.A gallon of milk in the U.S. cost
.99.Is this the third sentence?Yes,it is!"
x=nltk.sent_tokenize(text4)
x[0]
我原以为 x[0] 会 return 第一句话,但我得到了
Out[4]: 'This is the first sentence.A gallon of milk in the U.S. cost .99.Is this the third sentence?Yes,it is!'
我做错了什么吗?
您的句子中需要有效的空格和标点符号,分词器才能正常运行:
import nltk
text4 = "This is a sentence. This is another sentence."
nltk.sent_tokenize(text4)
# ['This is a sentence.', 'This is another sentence.']
## Versus What you had before
nltk.sent_tokenize("This is a sentence.This is another sentence.")
# ['This is a sentence.This is another sentence.']
NLTK sent_tokenizer 不能很好地处理格式错误的文本。如果您提供适当的间距,那么它就可以工作。
import nltk
nltk.download('punkt')
text4="This is the first sentence. A gallon of milk in the U.S. cost .99. Is this
the third sentence? Yes, it is"
x=nltk.sent_tokenize(text4)
x[0]
或
你可以用这个。
import re
text4 = "This is the first sentence. A gallon of milk in the U.S. cost 2.99. Is this
the third sentence? Yes it is"
sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text4)
sentences
我是 NLP 新手。我正在尝试在 python 3 上使用 nlp 标记句子。7.So 我使用了以下代码
import nltk
text4="This is the first sentence.A gallon of milk in the U.S. cost
.99.Is this the third sentence?Yes,it is!"
x=nltk.sent_tokenize(text4)
x[0]
我原以为 x[0] 会 return 第一句话,但我得到了
Out[4]: 'This is the first sentence.A gallon of milk in the U.S. cost .99.Is this the third sentence?Yes,it is!'
我做错了什么吗?
您的句子中需要有效的空格和标点符号,分词器才能正常运行:
import nltk
text4 = "This is a sentence. This is another sentence."
nltk.sent_tokenize(text4)
# ['This is a sentence.', 'This is another sentence.']
## Versus What you had before
nltk.sent_tokenize("This is a sentence.This is another sentence.")
# ['This is a sentence.This is another sentence.']
NLTK sent_tokenizer 不能很好地处理格式错误的文本。如果您提供适当的间距,那么它就可以工作。
import nltk
nltk.download('punkt')
text4="This is the first sentence. A gallon of milk in the U.S. cost .99. Is this
the third sentence? Yes, it is"
x=nltk.sent_tokenize(text4)
x[0]
或 你可以用这个。
import re
text4 = "This is the first sentence. A gallon of milk in the U.S. cost 2.99. Is this
the third sentence? Yes it is"
sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text4)
sentences