如何找到无效的 Link 语法标记？

Question

我想将 Link Grammar Python3 绑定用于简单的语法检查器。虽然 linkage API 的记录相对完整，但似乎没有办法访问所有阻止 linkages.

的令牌

这是我目前拥有的：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from linkgrammar import Sentence, ParseOptions, Dictionary, __version__
print('Link Grammar Version:', __version__)

for sentence in ['This is a valid sample sentence.', 'I Can Has Cheezburger?']:
    sent = Sentence(sentence, Dictionary(), ParseOptions())
    linkages = sent.parse()
    if len(linkages) > 0:
        print('Valid:', sentence)
    else:
        print('Invalid:', sentence)

（我使用 link-grammar-5.4.3 进行测试。）

当我使用Link Parser 命令行工具分析无效例句时，得到以下输出：

linkparser> I Can Has Cheezburger?
No complete linkages found.
Found 1 linkage (1 had no P.P. violations) at null count 1
    Unique linkage, cost vector = (UNUSED=1 DIS= 0.10 LEN=7)

    +------------------Xp------------------+
    +------------->Wa--------------+       |
    |            +---G--+-----G----+       |
    |            |      |          |       |
LEFT-WALL [I] Can[!] Has[!] Cheezburger[!] ?

如何使用 Python3 获取所有标有 [!] 或 [?] 的潜在无效令牌？

Answer 1

在bindings/python-examples/sentence-check.py中查看它是如何完成的。最好查看最新的 repo 版本 (the current one is here)，因为此演示程序在 5.4.3 时存在错误。

具体提取词表如下：

words = list(linkage.words())

Unlinked 单词被包裹在 [] 中。附加 [] 的单词是猜测的单词。例如，[!] 表示该词已由正则表达式（出现在文件 4.0.regex 中）分类，然后在字典中查找此分类。如果将 parse-option display_morphology 设置为 True，则分类正则表达式名称出现在 !.

之后

这是单词输出格式的完整图例：

 [word]            Null-linked word
 word[!]           word classified by a regex
 word[!REGEX_NAME] word classified by REGEX_NAME (turn on by morphology=1)
 word[~]           word generated by a spell guess (unknown original word)
 word[&]           word run-on separated by a spell guess
 word[?]           word is unknown (looked up in the dict as UNKNOWN-WORD)
 word.POS          word found in the dictionary as word.POS
 word.#CORRECTION  word is probably a typo - got linked as CORRECTION

For dictionaries that support morphology (turn on by morphology=1):
 word=             A prefix morpheme
 =word             A suffix morpheme
 word.=            A stem

将输出词与原始句子词匹配可能很有用，尤其是在拼写更正或形态学打开的情况下。当您使用 -p 调用时，上述演示程序 sentence-check.py 会执行此操作 - 请参阅 if arg.position:.

下的代码

在你的demo句子I Can Has Cheezburger?的情况下，只有I这个词没有linkage，其他词都被归类为capitalized-words并且得到了linked 作为专有名词（G link 类型）。

您可以在 summarize-links 中找到有关 link 类型的更多信息。

如何找到无效的 Link 语法标记？

How to find invalid Link Grammar tokens?

validation

token

python-3.x

link-grammar