获取找到的命名实体的开始和结束位置
Get the start and end position of found named entities
我对 ML 和 Spacy 都很陌生。我正在尝试从输入文本中显示 命名实体 。
这是我的方法:
def run():
nlp = spacy.load('en_core_web_sm')
sentence = "Hi my name is Oliver!"
doc = nlp(sentence)
#Threshold for the confidence socres.
threshold = 0.2
beams = nlp.entity.beam_parse(
[doc], beam_width=16, beam_density=0.0001)
entity_scores = defaultdict(float)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for start, end, label in ents:
entity_scores[(start, end, label)] += score
#Create a dict to store output.
ners = defaultdict(list)
ners['text'] = str(sentence)
for key in entity_scores:
start, end, label = key
score = entity_scores[key]
if (score > threshold):
ners['extractions'].append({
"label": str(label),
"text": str(doc[start:end]),
"confidence": round(score, 2)
})
pprint(ners)
上述方法工作正常,将打印如下内容:
'extractions': [{'confidence': 1.0,
'label': 'PERSON',
'text': 'Oliver'}],
'text': 'Hi my name is Oliver'})
到目前为止一切顺利。现在我正在尝试获取找到的命名实体的实际位置。在这种情况下 "Oliver".
查看 documentation,有:ent.start_char, ent.end_char
可用,但如果我使用它:
"start_position": doc.start_char,
"end_position": doc.end_char
我收到以下错误:
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'start_char'
有人可以指导我正确的方向吗?
所以我实际上在发布这个问题后就找到了答案(典型)。
我发现我不需要将信息保存到 entity_scores
中,而只是遍历实际找到的实体 ent
:
我最终添加了 for ent in doc.ents:
,这让我可以访问所有标准的 Spacy attributes。见下文:
ners = defaultdict(list)
ners['text'] = str(sentence)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for ent in doc.ents:
if (score > threshold):
ners['extractions'].append({
"label": str(ent.label_),
"text": str(ent.text),
"confidence": round(score, 2),
"start_position": ent.start_char,
"end_position": ent.end_char
我的整个方法最终看起来像这样:
def run():
nlp = spacy.load('en_core_web_sm')
sentence = "Hi my name is Oliver!"
doc = nlp(sentence)
threshold = 0.2
beams = nlp.entity.beam_parse(
[doc], beam_width=16, beam_density=0.0001)
ners = defaultdict(list)
ners['text'] = str(sentence)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for ent in doc.ents:
if (score > threshold):
ners['extractions'].append({
"label": str(ent.label_),
"text": str(ent.text),
"confidence": round(score, 2),
"start_position": ent.start_char,
"end_position": ent.end_char
})
如果有人来这里想要一个简单的问题答案,我相信应该这样做:
nlp = spacy.load('en_core_web_sm')
sentence = "Hi my name is Oliver!"
doc = nlp(sentence)
for ent in doc.ents:
print(f"Entity {ent} found with start at {ent.start_char} and end at {ent.end_char}")
我对 ML 和 Spacy 都很陌生。我正在尝试从输入文本中显示 命名实体 。
这是我的方法:
def run():
nlp = spacy.load('en_core_web_sm')
sentence = "Hi my name is Oliver!"
doc = nlp(sentence)
#Threshold for the confidence socres.
threshold = 0.2
beams = nlp.entity.beam_parse(
[doc], beam_width=16, beam_density=0.0001)
entity_scores = defaultdict(float)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for start, end, label in ents:
entity_scores[(start, end, label)] += score
#Create a dict to store output.
ners = defaultdict(list)
ners['text'] = str(sentence)
for key in entity_scores:
start, end, label = key
score = entity_scores[key]
if (score > threshold):
ners['extractions'].append({
"label": str(label),
"text": str(doc[start:end]),
"confidence": round(score, 2)
})
pprint(ners)
上述方法工作正常,将打印如下内容:
'extractions': [{'confidence': 1.0,
'label': 'PERSON',
'text': 'Oliver'}],
'text': 'Hi my name is Oliver'})
到目前为止一切顺利。现在我正在尝试获取找到的命名实体的实际位置。在这种情况下 "Oliver".
查看 documentation,有:ent.start_char, ent.end_char
可用,但如果我使用它:
"start_position": doc.start_char,
"end_position": doc.end_char
我收到以下错误:
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'start_char'
有人可以指导我正确的方向吗?
所以我实际上在发布这个问题后就找到了答案(典型)。
我发现我不需要将信息保存到 entity_scores
中,而只是遍历实际找到的实体 ent
:
我最终添加了 for ent in doc.ents:
,这让我可以访问所有标准的 Spacy attributes。见下文:
ners = defaultdict(list)
ners['text'] = str(sentence)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for ent in doc.ents:
if (score > threshold):
ners['extractions'].append({
"label": str(ent.label_),
"text": str(ent.text),
"confidence": round(score, 2),
"start_position": ent.start_char,
"end_position": ent.end_char
我的整个方法最终看起来像这样:
def run():
nlp = spacy.load('en_core_web_sm')
sentence = "Hi my name is Oliver!"
doc = nlp(sentence)
threshold = 0.2
beams = nlp.entity.beam_parse(
[doc], beam_width=16, beam_density=0.0001)
ners = defaultdict(list)
ners['text'] = str(sentence)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for ent in doc.ents:
if (score > threshold):
ners['extractions'].append({
"label": str(ent.label_),
"text": str(ent.text),
"confidence": round(score, 2),
"start_position": ent.start_char,
"end_position": ent.end_char
})
如果有人来这里想要一个简单的问题答案,我相信应该这样做:
nlp = spacy.load('en_core_web_sm')
sentence = "Hi my name is Oliver!"
doc = nlp(sentence)
for ent in doc.ents:
print(f"Entity {ent} found with start at {ent.start_char} and end at {ent.end_char}")