如何从 spacy 中的名词块中删除 ORG 名称和 GPE

Question

我有以下代码

import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()

doc = nlpsm(text)

finalwor = []
    fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
    fil_a = [i for i in doc.ents if i.label_.lower() in ['GPE']]
    fil_b = [i for i in doc.ents if i.label_.lower() in ['ORG']]
    for chunk in doc.noun_chunks:
        if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
            finalwor=list(doc.noun_chunks)
            print("finalwor after noun_chunk", finalwor)
        else: 
            chunk in fil_a and chunk in fil_b
            entword=list(str(chunk.text).replace(str(chunk.text),""))
            finalwor.extend(entword)

我不确定我做错了什么。如果文字是'IT manager at Google'

我当前的输出是“IT 经理，Google'

我想要的理想输出是"IT manager"。

基本上我希望将公司名称和 GPE 名称替换为空字符串，或者直接将其删除。

Answer 1

我认为，finalwor=list(doc.noun_chunks)，您将 doc 中出现的所有名词附加到最后一个单词，而不仅仅是证明您的陈述的名词

您可能正在寻找这样的东西：

import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()

doc = nlpsm('Maria, IT manager at Google and gardener')

finalwor = []
fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
fil_a = [i for i in doc.ents if i.label_.lower() in ['gpe']]
fil_b = [i for i in doc.ents if i.label_.lower() in ['org']]

for chunk in doc.noun_chunks:
    if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
        finalwor.append(chunk)

print("finalwor after noun_chunk", finalwor)

noun_chunk[IT 经理，园丁]

之后的决赛

如何从 spacy 中的名词块中删除 ORG 名称和 GPE

How to remove ORG names and GPE from noun chunk in spacy

string

nlp

replace

python-3.x

spacy