spacy sentence tokenizer 的跨度
spans of spacy sentence tokenizer
我正在使用 spacy 来标记文档中的句子。标记化后,我需要能够重建原始文档。我怎样才能得到每个句子的跨度?
s='this is sentence1.\nthis is sentence2.'
nlp = spacy.load('en_core_web_sm')
doc = nlp(s)
for sent in doc.sents:
print(sent.text.span)
[ 0,19]
[19,37]
我想获取找到的每个句子的跨度。 3 个句子的预期输出为:
有没有办法获取每个发送的跨度?
因为sent
是对象的spacy.tokens.span.Span
you may access the start_char
and end_char
attributes类型:
print( [sent.start_char, sent.end_char] )
Python 测试:
import spacy
nlp = spacy.load("en_core_web_sm")
s='this is sentence1.\nthis is sentence2.'
doc = nlp(s)
for sent in doc.sents:
print( [sent.start_char, sent.end_char] )
输出:[0, 19] [19, 37]
我正在使用 spacy 来标记文档中的句子。标记化后,我需要能够重建原始文档。我怎样才能得到每个句子的跨度?
s='this is sentence1.\nthis is sentence2.'
nlp = spacy.load('en_core_web_sm')
doc = nlp(s)
for sent in doc.sents:
print(sent.text.span)
[ 0,19]
[19,37]
我想获取找到的每个句子的跨度。 3 个句子的预期输出为:
有没有办法获取每个发送的跨度?
因为sent
是对象的spacy.tokens.span.Span
you may access the start_char
and end_char
attributes类型:
print( [sent.start_char, sent.end_char] )
Python 测试:
import spacy
nlp = spacy.load("en_core_web_sm")
s='this is sentence1.\nthis is sentence2.'
doc = nlp(s)
for sent in doc.sents:
print( [sent.start_char, sent.end_char] )
输出:[0, 19] [19, 37]