NLTK:如何根据句子映射提取信息?

NLTK: How can I extract information based on sentence maps?

我知道您可以使用名词提取从句子中提取名词,但我如何使用句子 overlays/maps 提取短语?

例如:

Sentence Overlay:

"First, @action; Second, Foobar"

Input:

"First, Dance and Code; Second, Foobar"

I want to return:

action = "Dance and Code"

您可以稍微重写您的字符串模板以将它们变成正则表达式,并查看哪个(或哪些)匹配。

>>> template = "First, (?P<action>.*); Second, Foobar"
>>> mo = re.search(template, "First, Dance and Code; Second, Foobar")
>>> if mo:
        print(mo.group("action"))
Dance and Code

您甚至可以将现有的字符串转换为这种正则表达式(在转义 .?*() 等正则表达式元字符之后)。

>>> template = "First, @action; (Second, Foobar...)"
>>> re_template = re.sub(r"\@(\w+)", r"(?P<\g<1>>.*)", re.escape(template))
>>> print(re_template)
First\,\ (?P<action>.*)\;\ \(Second\,\ Foobar\.\.\.\)