Spacy 中 "Span.as_doc()" 方法的问题

Question

我正在使用 Spacy 提取宾语和直接宾语。 Noun.chunks 已经为它们的根添加了依赖标记，例如 dative 和 dobj，我想要做的是获取 Span 并将其保存为 Doc 以进行进一步分析.

我有以下代码：

import spacy
nlp = spacy.load("en_core_web_lg")
doc = nlp(open("/-textfile").read())

到目前为止一切顺利，接下来我得到了 Span 对象；

datives = []

for dat in doc.noun_chunks:
    if dat.root.dep_ == "dative" and dat.root.head.pos_ == "VERB":
            dative.append(dat.sent)

现在我有了所有带有 noun.chunks 的句子，其中词根是与格，中心词是 VERB

但是，我想从 datives []

中获取 token 数据

dativesent = datives.as_doc()

但问题是 datives [] 已经是一个列表，我无法将其转换为 DOC。

如何将与格-noun.chunks的句子保存为DOC?

Answer 1

您可以像 Doc 一样遍历一个句子（Span）来访问标记：

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("She gave the dog a bone. He read a book. They gave her a book.")

dative_sents = []
for nc in doc.noun_chunks:
    if nc.root.dep_ == "dative" and nc.root.head.pos_ == "VERB":
        dative_sents.append(nc.sent)

for dative_sent in dative_sents:
    print("Sentence with dative:", dative_sent.text)
    for token in dative_sent:
        print(token.text, token.pos_, token.dep_)
    print()

输出：

Sentence with dative: She gave the dog a bone.
She PRON nsubj
gave VERB ROOT
the DET det
dog NOUN dative
a DET det
bone NOUN dobj
. PUNCT punct

Sentence with dative: They gave her a book.
They PRON nsubj
gave VERB ROOT
her PRON dative
a DET det
book NOUN dobj
. PUNCT punct

Spacy 中 "Span.as_doc()" 方法的问题

Problem with "Span.as_doc()" method in Spacy

spacy