正则表达式匹配文本中的单词,但不匹配引号或注释中的单词
Regex match word in a text but not in quotes or comments
我正在为 VS Code 构建扩展并使用格式化程序 API 将所有关键字大写。
假设我在编辑器中有代码。
TYPE MyStruct : STRUCT
this.var1 : POINTER TO INT; (* Указатель 1 *)
var2 : POINTER TO INT; (* this is Указатель 2 *)
sStr: STRING(200) := "This
Test this line";
sStr: STRING(200) := "Test this line";
sStr: STRING(200) := 'Test this line';
END_STRUCT
END_TYPE
THIS.MyStruct := 100;
我想找到注释 (* ... *)
或字符串(用单引号或双引号)中的所有 this
个单词?
我的尝试是用ig
(?<=^([^"'])*)\bthis\b
但它仍然 select 在评论中,如果有换行。
这是我的真实代码示例
let keywords = [
'true', 'false', 'exit', 'continue', 'return', 'constant', 'retain',
'public', 'private', 'protected', 'abstract','persistent','internal',
'final','of','else','elsif','then','__try','__catch','__finally',
'__endtry','do','to','by','task','with','using','uses','from',
'until','or','or_else','and','and_then','not','xor','nor','ge',
'le','eq','ne','gt','lt','__new','__delete', 'extends','implements',
'this','super'
];
let regEx = new RegExp(`\b(?:${keywords.join('|')}|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\b`, "ig");
text = text.replace(regEx, (match) => {
return match.toUpperCase();
});
您需要匹配需要丢弃的上下文,然后匹配并捕获需要修改的模式:
/(?<!\(?:\{2})*)"[^"\]*(?:\[\s\S][^\"]*)*"|\(\*[\s\S]*?\*\)|\b(true|false|exit|continue|return|constant|retain|public|private|protected|abstract|persistent|internal|final|of|else|elsif|then|__try|__catch|__finally|__endtry|do|to|by|task|with|using|uses|from|until|or|or_else|and|and_then|not|xor|nor|ge|le|eq|ne|gt|lt|__new|__delete|extends|implements|this|super|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\b/gi
看到这个regex demo。
我将您模式中的第一个 (?:
更改为 (
以便将您预期的匹配项捕获到第 1 组中,并在模式的开头添加 (?<!\(?:\{2})*)"[^"\]*(?:\[\s\S][^\"]*)*"|\(\*[\s\S]*?\*\)|
:
(?<!\(?:\{2})*)"[^"\]*(?:\[\s\S][^\"]*)*"
- 一个不以反斜杠开头的位置可选地后跟任意数量的反斜杠,然后是一个支持转义序列的双引号字符串
|
- 或
\(\*[\s\S]*?\*\)
- (*
,然后是任何 0+ 个字符,尽可能少,然后是 *)
.
参见JavaScript演示:
const keywords = [
'true', 'false', 'exit', 'continue', 'return', 'constant', 'retain',
'public', 'private', 'protected', 'abstract','persistent','internal',
'final','of','else','elsif','then','__try','__catch','__finally',
'__endtry','do','to','by','task','with','using','uses','from',
'until','or','or_else','and','and_then','not','xor','nor','ge',
'le','eq','ne','gt','lt','__new','__delete', 'extends','implements',
'this','super'
];
const regEx = new RegExp(String.raw`(?<!\(?:\{2})*)"[^"\]*(?:\.[^\"]*)*"|\(\*.*?\*\)|\b(${keywords.join('|')}|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\b`, "igs");
let text = "TYPE MyStruct : STRUCT\n this.var1 : POINTER TO INT; (* Указатель 1 *)\n var2 : POINTER TO INT; (* this is Указатель 2 *)\n sStr: STRING(200) := \"This \n Test this line\"; \n sStr: STRING(200) := \"Test this line\"; \n sStr: STRING(200) := 'Test this line'; \n END_STRUCT\nEND_TYPE\n\nTHIS.MyStruct := 100;";
text = text.replace(regEx, (match,group) => {
return group != undefined ? match.toUpperCase() : match;
});
console.log(text);
我正在为 VS Code 构建扩展并使用格式化程序 API 将所有关键字大写。
假设我在编辑器中有代码。
TYPE MyStruct : STRUCT
this.var1 : POINTER TO INT; (* Указатель 1 *)
var2 : POINTER TO INT; (* this is Указатель 2 *)
sStr: STRING(200) := "This
Test this line";
sStr: STRING(200) := "Test this line";
sStr: STRING(200) := 'Test this line';
END_STRUCT
END_TYPE
THIS.MyStruct := 100;
我想找到注释 (* ... *)
或字符串(用单引号或双引号)中的所有 this
个单词?
我的尝试是用ig
(?<=^([^"'])*)\bthis\b
但它仍然 select 在评论中,如果有换行。
这是我的真实代码示例
let keywords = [
'true', 'false', 'exit', 'continue', 'return', 'constant', 'retain',
'public', 'private', 'protected', 'abstract','persistent','internal',
'final','of','else','elsif','then','__try','__catch','__finally',
'__endtry','do','to','by','task','with','using','uses','from',
'until','or','or_else','and','and_then','not','xor','nor','ge',
'le','eq','ne','gt','lt','__new','__delete', 'extends','implements',
'this','super'
];
let regEx = new RegExp(`\b(?:${keywords.join('|')}|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\b`, "ig");
text = text.replace(regEx, (match) => {
return match.toUpperCase();
});
您需要匹配需要丢弃的上下文,然后匹配并捕获需要修改的模式:
/(?<!\(?:\{2})*)"[^"\]*(?:\[\s\S][^\"]*)*"|\(\*[\s\S]*?\*\)|\b(true|false|exit|continue|return|constant|retain|public|private|protected|abstract|persistent|internal|final|of|else|elsif|then|__try|__catch|__finally|__endtry|do|to|by|task|with|using|uses|from|until|or|or_else|and|and_then|not|xor|nor|ge|le|eq|ne|gt|lt|__new|__delete|extends|implements|this|super|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\b/gi
看到这个regex demo。
我将您模式中的第一个 (?:
更改为 (
以便将您预期的匹配项捕获到第 1 组中,并在模式的开头添加 (?<!\(?:\{2})*)"[^"\]*(?:\[\s\S][^\"]*)*"|\(\*[\s\S]*?\*\)|
:
(?<!\(?:\{2})*)"[^"\]*(?:\[\s\S][^\"]*)*"
- 一个不以反斜杠开头的位置可选地后跟任意数量的反斜杠,然后是一个支持转义序列的双引号字符串|
- 或\(\*[\s\S]*?\*\)
-(*
,然后是任何 0+ 个字符,尽可能少,然后是*)
.
参见JavaScript演示:
const keywords = [
'true', 'false', 'exit', 'continue', 'return', 'constant', 'retain',
'public', 'private', 'protected', 'abstract','persistent','internal',
'final','of','else','elsif','then','__try','__catch','__finally',
'__endtry','do','to','by','task','with','using','uses','from',
'until','or','or_else','and','and_then','not','xor','nor','ge',
'le','eq','ne','gt','lt','__new','__delete', 'extends','implements',
'this','super'
];
const regEx = new RegExp(String.raw`(?<!\(?:\{2})*)"[^"\]*(?:\.[^\"]*)*"|\(\*.*?\*\)|\b(${keywords.join('|')}|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\b`, "igs");
let text = "TYPE MyStruct : STRUCT\n this.var1 : POINTER TO INT; (* Указатель 1 *)\n var2 : POINTER TO INT; (* this is Указатель 2 *)\n sStr: STRING(200) := \"This \n Test this line\"; \n sStr: STRING(200) := \"Test this line\"; \n sStr: STRING(200) := 'Test this line'; \n END_STRUCT\nEND_TYPE\n\nTHIS.MyStruct := 100;";
text = text.replace(regEx, (match,group) => {
return group != undefined ? match.toUpperCase() : match;
});
console.log(text);