模式未按预期运行

Question

实际的模式不是英文的，所以我创建了这个简化的例子来重现这个问题：有 3 级注释（实际应用需要）并且第 3 级模式没有按预期工作。要识别的短语是： a b c

我的期望：

1级："a"标注为A，"b"标注为"B"
2：如果有标注A和B，一起标注为AB
3rd：如果至少有一个注解AB并且有单词"c"，则将它们一起注解为C 模式如下所示。

# 1.
{  pattern: (/a/), action: (Annotate([=10=], name, "A")) }
{  pattern: (/b/), action: (Annotate([=10=], name, "B")) }
# 2.
{  pattern: (([name:A]) ([name:B])), action: (Annotate([=10=], name, "AB")) }
# 3.
{  pattern: (([name:AB]+) /c/), action: (Annotate([=10=], name, "C")) }

#1 和#2 有效并且 "a b" 被注释：匹配的令牌：NamedEntitiesToken{word='a' name='AB' beginPosition=0 endPosition=1} 匹配的令牌：NamedEntitiesToken{word='b' name='AB' beginPosition=2 endPosition=3} 但是 #3 模式不起作用，即使可以看到我们有 2 个 "AB" 注释标记，这正是 #3 模式所期望的。如果我将 #1 更改为

甚至更多

{  pattern: (/a/), action: (Annotate([=11=], name, "AB")) }
{  pattern: (/b/), action: (Annotate([=11=], name, "AB")) }

模式 #3 工作正常：匹配的令牌：NamedEntitiesToken{word='a' name='C' beginPosition=0 endPosition=1} 匹配的令牌：NamedEntitiesToken{word='b' name='C' beginPosition=2 endPosition=3} 匹配的令牌：NamedEntitiesToken{word='c' name='C' beginPosition=4 endPosition=5}

我在使用

时找不到匹配的标记之间的任何区别

# In this case #3 pattern works
{  pattern: (/a/), action: (Annotate([=12=], name, "AB")) }
{  pattern: (/b/), action: (Annotate([=12=], name, "AB")) }

或者当我使用

# In this case #3 pattern doesn't work
# 1.
{  pattern: (/a/), action: (Annotate([=13=], name, "A")) }
{  pattern: (/b/), action: (Annotate([=13=], name, "B")) }
# 2.
{  pattern: (([name:A]) ([name:B])), action: (Annotate([=13=], name, "AB")) }

在这两种情况下，我得到了相同的注释，但第一种情况有效而第二种情况无效。我做错了什么？

Answer 1

这对我有用：

# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }

ENV.defaults["stage"] = 1

{ ruleType: "tokens", pattern: (/a/), action: Annotate([=10=], ner, "A") }
{ ruleType: "tokens", pattern: (/b/), action: Annotate([=10=], ner, "B") }

ENV.defaults["stage"] = 2

{ ruleType: "tokens", pattern: ([{ner: "A"}] [{ner: "B"}]), action: Annotate([=10=], ner, "AB") }

ENV.defaults["stage"] = 3

{ ruleType: "tokens", pattern: ([{ner: "AB"}]+ /c/), action: Annotate([=10=], ner, "ABC") }

这里有一篇关于 TokensRegex 的文章：

https://stanfordnlp.github.io/CoreNLP/tokensregex.html

模式未按预期运行

Patterns do not behave as expected

tokenize

stanford-nlp