如何从 spacy 中的名词块中删除 ORG 名称和 GPE
How to remove ORG names and GPE from noun chunk in spacy
我有以下代码
import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()
doc = nlpsm(text)
finalwor = []
fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
fil_a = [i for i in doc.ents if i.label_.lower() in ['GPE']]
fil_b = [i for i in doc.ents if i.label_.lower() in ['ORG']]
for chunk in doc.noun_chunks:
if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
finalwor=list(doc.noun_chunks)
print("finalwor after noun_chunk", finalwor)
else:
chunk in fil_a and chunk in fil_b
entword=list(str(chunk.text).replace(str(chunk.text),""))
finalwor.extend(entword)
我不确定我做错了什么。如果文字是'IT manager at Google'
我当前的输出是“IT 经理,Google'
我想要的理想输出是"IT manager"。
基本上我希望将公司名称和 GPE 名称替换为空字符串,或者直接将其删除。
我认为,finalwor=list(doc.noun_chunks)
,您将 doc
中出现的所有名词附加到最后一个单词,而不仅仅是证明您的陈述的名词
您可能正在寻找这样的东西:
import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()
doc = nlpsm('Maria, IT manager at Google and gardener')
finalwor = []
fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
fil_a = [i for i in doc.ents if i.label_.lower() in ['gpe']]
fil_b = [i for i in doc.ents if i.label_.lower() in ['org']]
for chunk in doc.noun_chunks:
if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
finalwor.append(chunk)
print("finalwor after noun_chunk", finalwor)
noun_chunk[IT 经理,园丁]
之后的决赛
我有以下代码
import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()
doc = nlpsm(text)
finalwor = []
fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
fil_a = [i for i in doc.ents if i.label_.lower() in ['GPE']]
fil_b = [i for i in doc.ents if i.label_.lower() in ['ORG']]
for chunk in doc.noun_chunks:
if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
finalwor=list(doc.noun_chunks)
print("finalwor after noun_chunk", finalwor)
else:
chunk in fil_a and chunk in fil_b
entword=list(str(chunk.text).replace(str(chunk.text),""))
finalwor.extend(entword)
我不确定我做错了什么。如果文字是'IT manager at Google'
我当前的输出是“IT 经理,Google'
我想要的理想输出是"IT manager"。
基本上我希望将公司名称和 GPE 名称替换为空字符串,或者直接将其删除。
我认为,finalwor=list(doc.noun_chunks)
,您将 doc
中出现的所有名词附加到最后一个单词,而不仅仅是证明您的陈述的名词
您可能正在寻找这样的东西:
import spacy
from spacy.tokens import Span
import en_core_web_lg
nlpsm = en_core_web_lg.load()
doc = nlpsm('Maria, IT manager at Google and gardener')
finalwor = []
fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
fil_a = [i for i in doc.ents if i.label_.lower() in ['gpe']]
fil_b = [i for i in doc.ents if i.label_.lower() in ['org']]
for chunk in doc.noun_chunks:
if chunk not in fil and chunk not in fil_a and chunk not in fil_b:
finalwor.append(chunk)
print("finalwor after noun_chunk", finalwor)
noun_chunk[IT 经理,园丁]
之后的决赛