spacy Entityruler 的正则表达式模式不起作用
Regex pattern for spacy Entityruler does not work
我正在尝试使用正则表达式识别实体并使用实体标尺标记它们。正则表达式模式 return 与 Matcher 匹配,但 return 与实体标尺不同,并且也适用于普通正则表达式。
from spacy.matcher import Matcher
text = u"Name: first last \n Phone: +1223456790 \n e-mail: first.last@testmail.com."
doc = nlp(text)
matcher = Matcher(nlp.vocab)
pattern = [{"LOWER": {'REGEX' :"\w+\.\w+\@\w+\.com"}}]
matcher.add("email", None, pattern)
matches = matcher(doc)
print([doc[start:end] for match_id, start, end in matches])
输出:
[first.last@testmail.com]
`
text = u"Name: first last \n Phone: +1223456790 \n e-mail: first.last@testmail.com."
from spacy.lang.en import English
from spacy.pipeline import EntityRuler
nlp = English()
ruler = EntityRuler(nlp, overwrite_ents=True)
pattern = [{"label": "Email", "pattern":
{"LOWER": {'REGEX' : "\w+\.\w+\@\w+\.com"}}
}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler, name='customer')
text = u"Name: first last \n Phone: +1223456790 \n e-mail: first.last@testmail.com."
doc = nlp(text)
for ent in doc.ents:
print(ent.text,ent.label_)
`
输出:
None
EntityRuler
模式需要作为令牌指令列表提供,就像 Matcher
模式中一样:
pattern = [{"label": "Email", "pattern":
[{"LOWER": {'REGEX' : "\w+\.\w+\@\w+\.com"}}]
}]
我正在尝试使用正则表达式识别实体并使用实体标尺标记它们。正则表达式模式 return 与 Matcher 匹配,但 return 与实体标尺不同,并且也适用于普通正则表达式。
from spacy.matcher import Matcher
text = u"Name: first last \n Phone: +1223456790 \n e-mail: first.last@testmail.com."
doc = nlp(text)
matcher = Matcher(nlp.vocab)
pattern = [{"LOWER": {'REGEX' :"\w+\.\w+\@\w+\.com"}}]
matcher.add("email", None, pattern)
matches = matcher(doc)
print([doc[start:end] for match_id, start, end in matches])
输出: [first.last@testmail.com]
`
text = u"Name: first last \n Phone: +1223456790 \n e-mail: first.last@testmail.com."
from spacy.lang.en import English
from spacy.pipeline import EntityRuler
nlp = English()
ruler = EntityRuler(nlp, overwrite_ents=True)
pattern = [{"label": "Email", "pattern":
{"LOWER": {'REGEX' : "\w+\.\w+\@\w+\.com"}}
}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler, name='customer')
text = u"Name: first last \n Phone: +1223456790 \n e-mail: first.last@testmail.com."
doc = nlp(text)
for ent in doc.ents:
print(ent.text,ent.label_)
` 输出: None
EntityRuler
模式需要作为令牌指令列表提供,就像 Matcher
模式中一样:
pattern = [{"label": "Email", "pattern":
[{"LOWER": {'REGEX' : "\w+\.\w+\@\w+\.com"}}]
}]