在 TCL PERL 中匹配正则表达式

Question

我有以下模式

    Pattern[1]: 
    Key : "key1" 
    Value : 100
    Pattern[2]: 
    Key : "key2" 
    Value : 20
    Pattern[3]: 
    Key : "key3" 
    Value : 30
    Pattern[4]: 
    Key : "key4" 
    Value : 220

我想隔离每个 Pattern 块。我正在使用 TCL。我使用的正则表达式没有解决目的

set updateList [regexp -all -inline {Pattern\[\d+\].*?Value.*?\n} $list]

使用哪个 Regexp 来分隔每个模式

我需要输出为

    Pattern[1]: 
    Key : "key1" 
    Value : 100


    Pattern[2]: 
    Key : "key2" 
    Value : 20


    Pattern[3]: 
    Key : "key3" 
    Value : 30


    Pattern[4]: 
    Key : "key4" 
    Value : 220

Answer 1

试试这个

Pattern\[\d+\](.|\n)*?Value.*?\n

点 . 字符匹配除换行符以外的任何字符，因此您需要添加它。请注意，您的行可能以回车字符结尾，因此您可能需要在.

中添加\r

Answer 2

您的模式 Pattern\[\d+\].*?Value.*?\n 包含混合量词：贪婪和惰性。 Tcl 不会像您在 PCRE (PHP, Perl), .NET 等中所期望的那样处理混合量词类型，它默认为第一个找到的量词，因为后续量词会继承前面的量词类型.因此，\d 之后的 + 是贪婪的，因此，所有其他（在 .*? 中）也是贪婪的——即使你声明它们是惰性的。此外，. 也匹配 Tcl 正则表达式中的换行符，因此，您的模式类似于 this.

因此，根据您的正则表达式，您可以 将 \d+ 设为惰性 \d+? 并将末尾的 \n 替换为 (?:\n|$) 以匹配换行符和字符串结尾:

set RE {Pattern\[\d+?\].*?Value.*?(?:\n|$)}
set updateList [regexp -all -inline $RE $str]

查看IDEONE demo

选项 1

此外，如果您的输入字符串始终具有与所有元素相同的结构，您可以使用更详细的正则表达式 - Pattern、Key、Value - 现在：

set updateList [regexp -all -inline {Pattern\[\d+\]:\s*Key[^\n]*\s*Value[^\n]*} $str]

参见IDEONE demo, and here is the regex demo。

由于 . 可以匹配换行符，我们需要使用 [^\n] 取反字符 class 来匹配换行符以外的任何字符。

备选方案 2

您可以使用展开的惰性子模式匹配 Pattern[n]:，然后使用任何不是 Pattern[n]: 序列起点的字符：

set RE {Pattern\[\d+\]:[^P]*(?:P(?!attern\[\d+\]).)*}
set updateList [regexp -all -inline $RE $str]

见another IDEONE demo and a regex101 demo

Answer 3

您想捕获行块并输出它们，中间有空行。您的示例数据显示不同级别的模式，可用于识别哪些线属于哪个块。

最简单的模式是这样的：输入中每三行组成一个块。这种模式建议这样处理：

set lines [split [string trim $list \n] \n]
foreach {a b c} $lines {puts $a\n$b\n$c\n\n}

您的示例数据中没有任何内容表明这行不通。尽管如此，您的示例数据中可能还没有反映出一些复杂情况。

如果输入中有杂散的空行，您可能需要先删除它们：

set lines [lmap line $lines {if {[string is space $line]} continue else {set line}}]

如果某些块包含的行数少于或多于示例中的行数，则另一个简单模式是每个块都以包含可选 (?) 空格和单词 Pattern 的行开头。这些行（第一行除外）在输出中应以块定界符开头：

set lines [split [string trim $list \n] \n]
puts [lindex $lines 0]
foreach line [lrange $lines 1 end] {
    if {[regexp {\s*Pattern} $line]} {
        puts \n$line
    } else {
        puts $line
    }
}
puts \n

如果这些行实际上不是以空格开头，您可以使用 string match Pattern* $line 而不是正则表达式。

文档：continue, foreach, if, lindex, lmap, lmap replacement, lrange, puts, regexp, set, split, string

Answer 4

% set list {    Pattern[1]: 
    Key : "key1" 
    Value : 100
    Pattern[2]: 
    Key : "key2" 
    Value : 20
    Pattern[3]: 
    Key : "key3" 
    Value : 30
    Pattern[4]: 
    Key : "key4" 
    Value : 220
}
% regexp -all -inline {Pattern\[\d+\].*?Value.*?\n} $list
{Pattern[1]: 
    Key : "key1" 
    Value : 100
    Pattern[2]: 
    Key : "key2" 
    Value : 20
    Pattern[3]: 
    Key : "key3" 
    Value : 30
    Pattern[4]: 
    Key : "key4" 
    Value : 220
}
% regexp -all -inline {Pattern\[\d+?\].*?Value.*?\n} $list   ;# only changing `\d+` to `\d+?`
{Pattern[1]: 
    Key : "key1" 
    Value : 100
} {Pattern[2]: 
    Key : "key2" 
    Value : 20
} {Pattern[3]: 
    Key : "key3" 
    Value : 30
} {Pattern[4]: 
    Key : "key4" 
    Value : 220
}

如果 $list 不是以换行符结尾，您将不会返回 "pattern[4]" 元素。在这种情况下，更改

% regexp -all -inline {Pattern\[\d+?\].*?Value.*?\n} $list

至

% regexp -all -inline {Pattern\[\d+?\].*?Value.*?(?:\n|$)} $list

在 TCL PERL 中匹配正则表达式

Matching a regexp in TCL PERL

regex

tcl