"inconsistent" 在正则表达式中使用代码块时的匹配结果 [Raku]

Question

在检查和测试正则表达式的各个方面时，我偶然发现了一个奇怪的 "incosistent" 行为。我试图在正则表达式中使用一些代码，但同样的行为也适用于使用无效代码块。特别是最让我感动的是，当我互换 :g 和 :x 修饰符时，匹配结果的差异。

以下代码片段描述了 "inconsistent" 行为。

首先没有代码块：

use v6.d;

if "test1 test2 test3 test4" ~~ m:g/ (\w+) / {
    say ~$_ for $/.list;
}

结果：

test1
test2
test3
test4

然后使用 :g 修饰符和代码块：

use v6.d;

if "test1 test2 test3 test4" ~~ m:g/ (\w+) {} / {
    say ~$_ for $/.list;
}

结果：

test4

最后是 :x 修饰符和代码块

use v6.d;

if "test1 test2 test3 test4" ~~ m:x(4)/ (\w+) {} / {
    say ~$_ for $/.list;
}

结果：

test1
test2
test3
test4

我原以为三个结果是一样的，但我很惊讶。

对这种行为有任何解释吗？

Answer 1

TL;DR Issue filed by @jakar and fixed by jnthn.

（经过更多测试和代码探索后重写。）

这对我（可能还有你）来说就像一个错误。 $/ 在使用 :g 和嵌入块时不知何故变得 kiboshed。

这个答案包括：

归零问题
查看编译源码
正在搜索问题队列and/or提交新问题

关注问题

my &debug = {;} # start off doing no debugging
$_ = 'aa';

say       m      / {debug 1} 'a' {debug 2} /; debug 3; # ｢a｣
say $/ if m      / {debug 1} 'a' {debug 2} /; debug 3; # ｢a｣

say       m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (｢a｣ ｢a｣)
say $/ if m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (｢a｣ ｢a｣)

say       m:g    / {debug 1} 'a' {debug 2} /; debug 3; # (｢a｣ ｢a｣)
say $/ if m:g    / {debug 1} 'a' {debug 2} /; debug 3; # ｢a｣ <--- Uhoh

现在让 debug 说一些有用的东西，运行第一对（没有正则表达式副词）：

&debug = { say $_, $/.WHICH } # Say location of object bound to `$/`

say       m      / {debug 1} 'a' {debug 2} /; debug 3; # ｢a｣
# 1Match|66118928
# 2Match|66118928
# ｢a｣
# 3Match|66118928

say $/ if m      / {debug 1} 'a' {debug 2} /; debug 3; # ｢a｣
# 1Match|66119072
# 2Match|66119072
# ｢a｣
# 3Match|66119072

两种情况下的结果都一样简单。匹配过程创建一个 Match 对象并坚持使用同一个对象。

现在 :x(2) 副词的两个变体：

say       m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (｢a｣ ｢a｣)
# 1Match|66119936
# 2Match|66119936
# 1Match|66120080
# 2Match|66120080
# 1Match|66120224
# (｢a｣ ｢a｣)
# 3List|67612624

say $/ if m:x(2) / {debug 1} 'a' {debug 2} /; debug 3; # (｢a｣ ｢a｣)
# 1Match|66120368
# 2Match|66120368
# 1Match|66120512
# 2Match|66120512
# 1Match|66120656
# (｢a｣ ｢a｣)
# 3List|67612672

这次匹配过程创建了一个 Match 对象并在一次传递中坚持使用它，然后在第二次传递中使用第二个匹配对象，最后在失败之前在第三次传递中使用第三个匹配对象匹配第三个 'a' （因此相应的 debug 2 不会被调用）。在 m.../.../ 调用结束时，它创建了一个 List 对象并将 that 绑定到 $/.

接下来我们运行两个 :g 案例中的第一个：

say       m:g    / {debug 1} 'a' {debug 2} /; debug 3; # (｢a｣ ｢a｣)
# 1Match|66119216
# 2Match|66119216
# 1Match|66119360
# 2Match|66119360
# 1Match|66119504
# (｢a｣ ｢a｣)
# 3Match|66119504

就像x:(2) 的情况一样，我们尝试了第三次但失败了。但是匹配过程 而不是 return 一个 List 而是一个 Match 对象。这是在第三遍中创建的。（这让我很惊讶。）

最后是 "Uhoh" 案例：

say $/ if m:g    / {debug 1} 'a' {debug 2} /; debug 3; # ｢a｣ <--- Uhoh
# 1Match|66119648
# 2Match|66119648
# 1Match|66119792
# 2Match|66119792
# ｢a｣
# 3Match|66119792

值得注意的是，预期的第三遍似乎没有开始。

查看编译器源码

探索相关源代码似乎很有价值。我会在这里写下它，以防您或其他读者感兴趣，并且万一这是一个错误，而我写的内容对修复它的人感兴趣。

Afaict 正则表达式中的代码块导致生成一个 AST 节点here，该节点在执行绑定操作的块中的语句之前插入一个子节点：

                    :op('bind'),

                    QAST::Var.new( :name('$/'), :scope('lexical') ),

                    QAST::Op.new(
                        QAST::Var.new( :name('$¢'), :scope('lexical') ),
                        :name('MATCH'),
                        :op('callmethod')
                    )

我对上面的理解是，它插入了将词法 $/ 符号绑定到对绑定到词法 $¢ 符号的对象的 .MATCH 方法调用结果的代码在运行块中的代码之前。

文档有 a section on $¢；我引用一句话：

The main difference between $/ and $¢ is scope: the latter only has a value inside [a] regex

我想知道为什么 $¢ 存在以及还有哪些其他差异。

继续前进...

我看到有 a raku level .MATCH. But it barely does anything. So I presume the code that's relevant is here。

在这一点上我会暂停。我可能会在以后的编辑中继续。

正在搜索问题队列and/or提交新问题

如果有人在接下来的几天内给出一个答案，证明您所展示的不是错误，或者已经被归档为错误，那就很公平了。

否则，请考虑自己搜索问题队列 and/or 在您认为最合适的任何问题队列中开始一个新问题（默认为 /rakudo/rakudo/issues）。

我已经搜索了四个 github.com 问题队列，作为撰写此答案的一部分，我认为可能相关：

我搜索了两个关键字，希望它们可以发现现有问题（"global" 和 "publish"）。没有相关的匹配问题。也许您还可以查找您认为申报者可能会使用的其他关键字。

如果您提出问题，请考虑添加您的测试、我的测试或其他变体，如果您知道该怎么做，请将其转换为标准的 roast 测试用例。

"inconsistent" 在正则表达式中使用代码块时的匹配结果 [Raku]

"inconsistent" match result when using code block in regex [Raku]

regex

modifier

match

raku

关注问题

查看编译器源码

正在搜索问题队列and/or提交新问题