SystemVerilog 和 RegEx:“\d”未被识别为字符 class

SystemVerilog & RegEx: "\d" is not recognized as a character class

我正在尝试在 SystemVerilog 中使用正则表达式。字符 class“\d”(一个数字)似乎不起作用,而其他字符 classes,如“\w”、“\s”可以正常工作。我尝试使用 SVLIB 和 UVM,行为相同。

为了重现我写了下面的代码。该代码仅使用 UVM 和 SVLIB 针对正则表达式“\d”和“\w”测试“1”。

module SandBox;

import svlib_pkg::*;
import uvm_pkg::*;

initial
begin

Str myString;
Regex regex;

string testString;
string reString;

testString = "1";
reString = "\d";

myString = Str::create(testString);
regex = Regex::create();
regex.setRE(reString);

$display("-------------------------------------");
$display("test string: %s", testString);
$display("regex: %s", reString);

if (regex.test(myString)) begin

    $display("SVLIB Test passed!");

end

if (!uvm_re_match(reString, testString)) begin

    $display("UVM Test passed!");   

end

testString = "1";
reString = "\w";

myString = Str::create(testString);
regex = Regex::create();
regex.setRE(reString);

$display("-------------------------------------");
$display("test string: %s", testString);
$display("regex: %s", reString);

if (regex.test(myString)) begin

    $display("SVLIB Test passed!");

end

if (!uvm_re_match(reString, testString)) begin

    $display("UVM Test passed!");   

end

end

endmodule

我有以下输出:

-------------------------------------
test string: 1
regex: \d
-------------------------------------
test string: 1
regex: \w
SVLIB Test passed!
UVM Test passed!

这种行为的原因是什么?底层系统? SV 语法中的东西?

根据 svlib 用户指南和程序员参考

svlib uses the "extended regular expression" dialect of the C library's POSIX-compliant regular expression subsystem, and you can find full details of how to write regular expressions in this dialect by consulting the man-page man 7 regex or any of the numerous online regular expression tutorials. The regex dialect of svlib is in almost all respects the same as that used by the Unix/Linux command egrep.

POSIX 正则表达式标准 (man 7 regexp) 又定义可用字符 classes 如下:

Within a bracket expression, the name of a character class enclosed in "[:" and ":]" stands for the list of all characters belonging to that class. Standard character class names are:

alnum   digit   punct
alpha   graph   space
blank   lower   upper
cntrl   print   xdigit

因此,数字字符 class 必须指定为:

[[:digit:]]

使用规范语法。

\d\w\s 是所谓的 Shorthand Character Classes,POSIX 标准没有定义它们,因此它们的可用性取决于在正则表达式上 您正在使用的引擎实现。

一些正则表达式引擎选择实现所有这些,其他的(例如sedgrep,只实现一个子集,不包括\d

svlib 实现声称 egrep 兼容,egrep 不支持 \d