* 可以在多个字符的符号标记中使用吗？

Question

example for sym 显示 * (WhateverCode) 代表单个符号

grammar Foo {
    token TOP { <letter>+ }
    proto token letter {*}
    token letter:sym<P> { <sym> }
    token letter:sym<e> { <sym> }
    token letter:sym<r> { <sym> }
    token letter:sym<l> { <sym> }
    token letter:sym<*> {   .   }
}.parse("I ♥ Perl", actions => class {
    method TOP($/) { make $<letter>.grep(*.<sym>).join }
}).made.say; # OUTPUT: «Perl␤»

但是，如果我们用它来代替由几个字母组成的符号，它会失败：

grammar Foo {
    token TOP { <action>+ % " " }
    proto token action {*}
    token action:sym<come> { <sym> }
    token action:sym<bebe> { <sym> }
    token action:sym<*> { . }
}.parse("come bebe ama").say; # Nil

由于 sym 本身确实可以处理具有多个字符的符号，我们如何定义与一组字符匹配的默认 sym 标记？

Answer 1

可能是这样的：

grammar Foo {
    token TOP { <action>+ % " " }
    proto token action {*}
    token action:sym<come> { <sym> }
    token action:sym<bebe> { <sym> }
    token action:sym<default> { \w+ }
}.parse("come bebe ama").say;

输出:

｢come bebe ama｣
 action => ｢come｣
  sym => ｢come｣
 action => ｢bebe｣
  sym => ｢bebe｣
 action => ｢ama｣

Answer 2

Can * be used in sym tokens for more than one character? ... The example for sym shows * (WhateverCode) standing in for a single symbol

这不是 WhateverCode 或 Whatever。¹

foo:sym<...>中的<...>是一个引号构造函数，所以...只是一个文字串。

这就是为什么这样做：

grammar g { proto token foo {*}; token foo:sym<*> { <sym> } }
say g.parse: '*', rule => 'foo'; # matches

对于P6而言，foo:sym<*>中的*只是一个随机字符串。可能是 abracadabra。我推测作者选择 * 来表示 "whatever" 的心理概念，因为它恰好与 P6 概念 Whatever 相匹配。可能他们太可爱了。

对于这个答案的其余部分，我将写 JJ 而不是 *，只要后者就 P6 而言只是一个任意字符串。

原型中的*是Whatever。但这与你的问题完全无关：

grammar g { proto token foo {*}; token foo:sym<JJ> { '*' } } say g.parse: '*', rule => 'foo'; # matches

在名称包含[=36=]部分的规则（tokens和regex都是规则）的正文中，你可以写<sym>，它将匹配[=的角之间的字符串36=]:

grammar g { proto token foo {*}; token foo:sym<JJ> { <sym> } } say g.parse: 'JJ', rule => 'foo'; # matches

但是你可以在rule/token/regex正文中写任何你喜欢的东西。 . 匹配单个字符：

grammar g { proto token foo {*}; token foo:sym<JJ> { . } } say g.parse: '*', rule => 'foo'; # matches

It will, however, fail if we use it to stand in for a symbol composed of several letters

没有。那是因为你改了语法。

如果您将语法改回原始编码（除了更长的 letter:sym<...>s 之外）它工作正常：

grammar Foo { token TOP { <letter>+ } proto token letter {*} token letter:sym<come> { <sym> } token letter:sym<bebe> { <sym> } token letter:sym<JJ> { . } }.parse( "come bebe ama", actions => class { method TOP($/) { make $<letter>.grep(*.<sym>).join } }) .made.say; # OUTPUT: «comebebe␤»

请注意，在原版中，letter:sym<JJ> 标记正在等待匹配任何单个字符——其中包括单个 space，因此它匹配这些字符并处理它们与.

但是在您的修改中，您在 TOP 令牌中的令牌之间添加了 required space。这有两个影响：

匹配"come"之后和"bebe"之后的space;

在"a"被letter:sym<JJ>匹配后，"a"和"m"之间缺少space意味着整体比赛在那一点失败。

sym, by itself, does work with symbols with more than one character

是的。 token foo:sym<bar> { ... } 所做的只是添加：

foo;
的多重分派替代方案
一个记号sym，词法范围为foo记号的主体，匹配'bar'.

how can we define a default sym token that matches a set of characters?

你可以写这样一个 sym 标记，但是，要清楚，因为你不希望它匹配一个固定的字符串，它不能在正文中使用 <sym>。（因为 <sym> 必须是一个固定的字符串。）如果你仍然想 在 sym 键下捕获 那么你可以在 $<sym>=正如 Håkon 在他们的回答下的评论中显示的那样。但它也可以是 letter:whatever with $<sym>= in the body.

我将把它写成 letter:default 标记，以强调它是 :sym<something> 没有任何区别。（如上所述，:sym<something> 与其他 :baz<...> 和 :bar<...> 只是一个替代品，唯一的补充是如果它是 :sym<something>，那么它也使 <sym> 子规则在关联规则的主体中可用，如果使用，则匹配固定字符串 'something'。）

根据foo开头的规则LTM logic，在所有rule foo:bar:baz:qux<...>个备选方案中选择获胜派遣。所以你需要写这样一个令牌，它 not 作为最长的令牌前缀获胜，但只有在没有其他匹配的情况下才匹配。

要立即在 LTM 比赛中排在后面，请在规则正文的开头插入 {}²:

token letter:default { {} \w+ }

现在，从后面开始，如果这条规则有机会匹配 \w+ 模式，当它遇到 non-word 字符时它将停止标记。

关于让它匹配的一点如果没有其他匹配可能意味着最后列出它。所以：

grammar Foo { token TOP { <letter>+ % ' ' } proto token letter {*} token letter:sym<come> { <sym> } # matches come token letter:sym<bebe> { <sym> } # matches bebe token letter:boo { {} \w**6 } # match 6 char string except eg comedy token letter:default { {} \w+ } # matches any other word }.parse( "come bebe amap", actions => class { method TOP($/) { make $<letter>.grep(*.<sym>).join } }) .made.say; # OUTPUT: «comebebe␤»

that just can't be the thing causing it ... "come bebe ama" shouldn't work in your grammar

代码有错误，我现在已经修复并为此道歉。如果你运行它，你会发现它像宣传的那样有效。

但是你的评论促使我扩大了我的答案。希望它现在能正确回答您的问题。

脚注

¹ 并不是说这些与实际发生的事情有任何关系但是......在 P6 中 * 在 "term position" （在英语，名词所属的地方，在一般编程术语中，值所属的地方）是 Whatever, not a WhateverCode. Even when * is written with an operator, eg. +* or * + *, rather than on its own, the *s are still just Whatevers, but the compiler automatically turns most such combinations of one or more *s with one or more operators into a sub-class of Code called a WhateverCode. (Exceptions are listed in a table here.)

² 参见 .
中的脚注 2

Answer 3

:sym<...> 内容用于您程序的 reader，而不是编译器，用于区分其他名称相同的多个标记。

刚好程序员开始写这样的语法：

token operator:sym<+> { '+' }
token operator:sym<-> { '-' }
token operator:sym</> { '/' }

为了避免重复符号（此处为 +、-、/），引入了一个特殊规则 <sym> 来匹配 [=13= 中的任何内容] 作为文字，所以你可以将上面的标记写为

token operator:sym<+> { <sym> }
token operator:sym<-> { <sym> }
token operator:sym</> { <sym> }

如果你不在正则表达式中使用 <sym>，你可以在 :sym<...> 中随意写任何你想写的东西，所以你可以这样写

token operator:sym<fallback> { . }

* 可以在多个字符的符号标记中使用吗？

Can * be used in sym tokens for more than one character?

grammar

raku

脚注