使用正则表达式捕获定界符内的子字符串并排除字符
Capture substring within delimiters and excluding characters using regex
正则表达式模式如何捕获两个定界符之间的子字符串,但排除第一个定界符之后和最后一个定界符(如果有)之前的某些字符(如果有)?
输入字符串看起来像这样:
var input = @"Not relevant {
#AddInfoStart Comment:String:=""This is a comment"";
AdditionalInfo:String:=""This is some additional info"" ;
# } also not relevant";
捕获应包含“{”和“}”之间的子字符串,但不包括任何空格、换行符和开始定界符“{”之后的“#AddInfoStart”字符串(如果它们存在的话),并且还排除任何空格、换行符和“;”和结束分隔符“}”之前的“#”字符(如果存在的话)。
捕获的字符串应如下所示
Comment:String:=""This is a comment"";
AdditionalInfo:String:=""This is some additional info""
可能在“:”和“:=”内部分隔符之前或之后有空格,而且“:=”之后的值并不总是标记为字符串,例如:
{ Val1 : Real := 1.7 }
数组使用以下语法:
arr1 : ARRAY [1..5] OF INT := [2,5,44,555,11];
arr2 : ARRAY [1..3] OF REAL
这是我的解决方案:
- 去掉括号外的内容
- 使用正则表达式获取括号内的值
代码:
var input = @"Not relevant {
#AddInfoStart Comment:String:=""This is a comment"";
Val1 : Real := 1.7
AdditionalInfo:String:=""This is some additional info"" ;
# } also not relevant";
// remove content outside brackets
input = Regex.Replace(input, @".*\{", string.Empty);
input = Regex.Replace(input, @"\}.*", string.Empty);
string property = @"(\w+)";
string separator = @"\s*:\s*"; // ":" with or without whitespace
string type = @"(\w+)";
string equals = @"\s*:=\s*"; // ":=" with or without whitespace
string text = @"""?(.*?)"""; // value between ""
string number = @"(\d+(\.\d+)?)"; // number like 123 or with a . separator such as 1.45
string value = $"({text}|{number})"; // value can be a string or number
string pattern = $"{property}{separator}{type}{equals}{value}";
var result = Regex.Matches(input, pattern)
.Cast<Match>()
.Select(match => new
{
FullMatch = match.Groups[0].Value, // full match is always the 1st group
Property = match.Groups[1].Value,
Type = match.Groups[2].Value,
Value = match.Groups[3].Value
})
.ToList();
正则表达式模式如何捕获两个定界符之间的子字符串,但排除第一个定界符之后和最后一个定界符(如果有)之前的某些字符(如果有)? 输入字符串看起来像这样:
var input = @"Not relevant {
#AddInfoStart Comment:String:=""This is a comment"";
AdditionalInfo:String:=""This is some additional info"" ;
# } also not relevant";
捕获应包含“{”和“}”之间的子字符串,但不包括任何空格、换行符和开始定界符“{”之后的“#AddInfoStart”字符串(如果它们存在的话),并且还排除任何空格、换行符和“;”和结束分隔符“}”之前的“#”字符(如果存在的话)。
捕获的字符串应如下所示
Comment:String:=""This is a comment"";
AdditionalInfo:String:=""This is some additional info""
可能在“:”和“:=”内部分隔符之前或之后有空格,而且“:=”之后的值并不总是标记为字符串,例如:
{ Val1 : Real := 1.7 }
数组使用以下语法:
arr1 : ARRAY [1..5] OF INT := [2,5,44,555,11];
arr2 : ARRAY [1..3] OF REAL
这是我的解决方案:
- 去掉括号外的内容
- 使用正则表达式获取括号内的值
代码:
var input = @"Not relevant {
#AddInfoStart Comment:String:=""This is a comment"";
Val1 : Real := 1.7
AdditionalInfo:String:=""This is some additional info"" ;
# } also not relevant";
// remove content outside brackets
input = Regex.Replace(input, @".*\{", string.Empty);
input = Regex.Replace(input, @"\}.*", string.Empty);
string property = @"(\w+)";
string separator = @"\s*:\s*"; // ":" with or without whitespace
string type = @"(\w+)";
string equals = @"\s*:=\s*"; // ":=" with or without whitespace
string text = @"""?(.*?)"""; // value between ""
string number = @"(\d+(\.\d+)?)"; // number like 123 or with a . separator such as 1.45
string value = $"({text}|{number})"; // value can be a string or number
string pattern = $"{property}{separator}{type}{equals}{value}";
var result = Regex.Matches(input, pattern)
.Cast<Match>()
.Select(match => new
{
FullMatch = match.Groups[0].Value, // full match is always the 1st group
Property = match.Groups[1].Value,
Type = match.Groups[2].Value,
Value = match.Groups[3].Value
})
.ToList();