原型代币候选人排序

Question

perl6 如何决定先匹配哪个 proto token？

下面的代码按预期工作，它匹配字符串 1234，而 Grammar::Tracer 显示匹配的第一个标记是 s:sym<d>，这是有道理的，因为它是最长的标记.

但是，如果我将字面量更改为标记，例如将 token three 形式 '3' 更改为 <digit>，则无法匹配，并且 Grammar::Tracer 表示 s:sym<b> 被第一个匹配。

将 s:sym<d> 移动到顶部，在两种情况下都匹配字符串，但对这种行为的解释是什么？

#!/usr/bin/env perl6
no precompilation;
use Grammar::Tracer;

grammar G {

  token TOP { <s> }

  proto token s { * }

  token s:sym<a> { <one> }
  token s:sym<b> { <one> <two> }
  token s:sym<c> { <one> <two> <three> }
  token s:sym<d> { <one> <two> <three> <four> }

  token one   { '1' }
  token two   { '2' }
  token three { '3' }
  token four  { '4' }
}

my $g = G.new;

say $g.parse: '1234';

# Output: Match
# token three { '3' }

TOP
|  s
|  |  s:sym<d>
|  |  |  one

# Output No Match
# token three { <digit> }

TOP
|  s
|  |  s:sym<b>
|  |  |  one

Answer 1

How does perl6 decide which proto token to match against first?

它使用"Longest alternation" logic。在您的（精心呈现！）案例中，相关的决定因素如下。

First, select the branch which has the longest declarative prefix.

所以首先要注意的是，它不是"longest token"而是最长的声明性前缀，a的start仅包含连续 "declarative" "atoms".

的模式

A 3 是声明性原子。

A <foo> 可能是也可能不是；这取决于它包含的内容。

我还没有找到明确的官方文档来确定哪些内置模式是声明性的，哪些不是，但看起来所有的模式都是用斜杠声明的，例如\d 是声明性的，而所有以 <foo> 形式声明的，例如 <digit>，都不是。（特别注意，内置的 <ws> 模式是 而不是 声明性的。鉴于 rules 中的原子被转换为 <ws>，这意味着第一个这样的 space 终止了该规则的声明性前缀。）

所以 <digit> 原子不是声明性前缀的一部分，而是终止前缀。

Moving s:sym<d> to the top, matches the string in both cases, but what is the explanation for that behavior?

因为将 <three> 更改为调用 <digit>，您已将规则更改为有三个最长的声明性前缀 (<one> <two>)。所以 other tie-breaking rules are used.

如果这些 tie-breaking 规则中的所有其他方法都无法选出获胜者，则选择最后一个 "left-most" 规则，即 ignoring inheritance，表示词法上排在第一位的规则。

原型代币候选人排序

proto token candidates ordering

regex

grammar

raku