一行中的所有匹配项:Spacy matcher
All matches in a line : Spacy matcher
我正在寻找一种解决方案,使用 Spacy 匹配器在一行中打印所有匹配项
例子是这样的,
在这里我尝试提取经验。
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
pattern = [{'POS': 'NUM'}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "years?|months?"}}]
matcher = Matcher(nlp.vocab)
matcher.add("Skills", None, pattern)
matches = matcher(doc)
pirnt(doc[matches[0][1]:matches[0][2]]
这里我得到输出 1+ years
。
但我正在寻找具有输出的解决方案
['1+ years','2 years']
您应该将第一项指定为 'LIKE_NUM': True
:
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
我也将years?|months?
缩减为(?:year|month)s?
,你甚至可以考虑使用^(?:year|month)s?$
匹配完整的token字符串,但目前没有必要。
代码:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
matcher.add("Skills", None, pattern)
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
matches = matcher(doc)
for _, start, end in matches:
print(doc[start:end].text)
输出:
1+ years
2 years
我正在寻找一种解决方案,使用 Spacy 匹配器在一行中打印所有匹配项
例子是这样的, 在这里我尝试提取经验。
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
pattern = [{'POS': 'NUM'}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "years?|months?"}}]
matcher = Matcher(nlp.vocab)
matcher.add("Skills", None, pattern)
matches = matcher(doc)
pirnt(doc[matches[0][1]:matches[0][2]]
这里我得到输出 1+ years
。
但我正在寻找具有输出的解决方案
['1+ years','2 years']
您应该将第一项指定为 'LIKE_NUM': True
:
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
我也将years?|months?
缩减为(?:year|month)s?
,你甚至可以考虑使用^(?:year|month)s?$
匹配完整的token字符串,但目前没有必要。
代码:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
pattern = [{'LIKE_NUM': True}, {'ORTH': '+', "OP": "?"}, {"LOWER": {"REGEX": "(?:year|month)s?"}}]
matcher.add("Skills", None, pattern)
doc = nlp("1+ years of experience in XX, 2 years of experiance in YY")
matches = matcher(doc)
for _, start, end in matches:
print(doc[start:end].text)
输出:
1+ years
2 years