SystemVerilog 和 RegEx:“\d”未被识别为字符 class
SystemVerilog & RegEx: "\d" is not recognized as a character class
我正在尝试在 SystemVerilog 中使用正则表达式。字符 class“\d”(一个数字)似乎不起作用,而其他字符 classes,如“\w”、“\s”可以正常工作。我尝试使用 SVLIB 和 UVM,行为相同。
为了重现我写了下面的代码。该代码仅使用 UVM 和 SVLIB 针对正则表达式“\d”和“\w”测试“1”。
module SandBox;
import svlib_pkg::*;
import uvm_pkg::*;
initial
begin
Str myString;
Regex regex;
string testString;
string reString;
testString = "1";
reString = "\d";
myString = Str::create(testString);
regex = Regex::create();
regex.setRE(reString);
$display("-------------------------------------");
$display("test string: %s", testString);
$display("regex: %s", reString);
if (regex.test(myString)) begin
$display("SVLIB Test passed!");
end
if (!uvm_re_match(reString, testString)) begin
$display("UVM Test passed!");
end
testString = "1";
reString = "\w";
myString = Str::create(testString);
regex = Regex::create();
regex.setRE(reString);
$display("-------------------------------------");
$display("test string: %s", testString);
$display("regex: %s", reString);
if (regex.test(myString)) begin
$display("SVLIB Test passed!");
end
if (!uvm_re_match(reString, testString)) begin
$display("UVM Test passed!");
end
end
endmodule
我有以下输出:
-------------------------------------
test string: 1
regex: \d
-------------------------------------
test string: 1
regex: \w
SVLIB Test passed!
UVM Test passed!
这种行为的原因是什么?底层系统? SV 语法中的东西?
根据 svlib 用户指南和程序员参考:
svlib uses the "extended
regular expression" dialect of the C library's POSIX-compliant regular expression subsystem,
and you can find full details of how to write regular expressions in this dialect by consulting
the man-page man 7 regex or any of the numerous online regular expression tutorials. The
regex dialect of svlib is in almost all respects the same as that used by the Unix/Linux
command egrep.
POSIX 正则表达式标准 (man 7 regexp) 又定义可用字符 classes 如下:
Within a bracket expression, the name of a character class enclosed in "[:" and ":]" stands for the list of all characters belonging to that class. Standard character class names are:
alnum digit punct
alpha graph space
blank lower upper
cntrl print xdigit
因此,数字字符 class 必须指定为:
[[:digit:]]
使用规范语法。
\d
、\w
或 \s
是所谓的 Shorthand Character Classes,POSIX 标准没有定义它们,因此它们的可用性取决于在正则表达式上
您正在使用的引擎实现。
一些正则表达式引擎选择实现所有这些,其他的(例如sed
或grep
,只实现一个子集,不包括\d
。
svlib
实现声称 egrep
兼容,egrep
不支持 \d
。
我正在尝试在 SystemVerilog 中使用正则表达式。字符 class“\d”(一个数字)似乎不起作用,而其他字符 classes,如“\w”、“\s”可以正常工作。我尝试使用 SVLIB 和 UVM,行为相同。
为了重现我写了下面的代码。该代码仅使用 UVM 和 SVLIB 针对正则表达式“\d”和“\w”测试“1”。
module SandBox;
import svlib_pkg::*;
import uvm_pkg::*;
initial
begin
Str myString;
Regex regex;
string testString;
string reString;
testString = "1";
reString = "\d";
myString = Str::create(testString);
regex = Regex::create();
regex.setRE(reString);
$display("-------------------------------------");
$display("test string: %s", testString);
$display("regex: %s", reString);
if (regex.test(myString)) begin
$display("SVLIB Test passed!");
end
if (!uvm_re_match(reString, testString)) begin
$display("UVM Test passed!");
end
testString = "1";
reString = "\w";
myString = Str::create(testString);
regex = Regex::create();
regex.setRE(reString);
$display("-------------------------------------");
$display("test string: %s", testString);
$display("regex: %s", reString);
if (regex.test(myString)) begin
$display("SVLIB Test passed!");
end
if (!uvm_re_match(reString, testString)) begin
$display("UVM Test passed!");
end
end
endmodule
我有以下输出:
-------------------------------------
test string: 1
regex: \d
-------------------------------------
test string: 1
regex: \w
SVLIB Test passed!
UVM Test passed!
这种行为的原因是什么?底层系统? SV 语法中的东西?
根据 svlib 用户指南和程序员参考:
svlib uses the "extended regular expression" dialect of the C library's POSIX-compliant regular expression subsystem, and you can find full details of how to write regular expressions in this dialect by consulting the man-page man 7 regex or any of the numerous online regular expression tutorials. The regex dialect of svlib is in almost all respects the same as that used by the Unix/Linux command egrep.
POSIX 正则表达式标准 (man 7 regexp) 又定义可用字符 classes 如下:
Within a bracket expression, the name of a character class enclosed in "[:" and ":]" stands for the list of all characters belonging to that class. Standard character class names are:
alnum digit punct alpha graph space blank lower upper cntrl print xdigit
因此,数字字符 class 必须指定为:
[[:digit:]]
使用规范语法。
\d
、\w
或 \s
是所谓的 Shorthand Character Classes,POSIX 标准没有定义它们,因此它们的可用性取决于在正则表达式上
您正在使用的引擎实现。
一些正则表达式引擎选择实现所有这些,其他的(例如sed
或grep
,只实现一个子集,不包括\d
。
svlib
实现声称 egrep
兼容,egrep
不支持 \d
。