Tcl 中的模式匹配

Question

我有 somefile.txt，包含如下行：

{ abc1 } 1
{ cde1 } 101
{ fgh1 } 1
{ ijk1 } 2

它是一个巨大的文件，我只想找到第 1 行和第 3 行并计算它们。

我已经尝试使用 {\s\}\s1\n} 的正则表达式和 lsearch（将其转换为列表），但它不起作用。我该怎么办...?

我也试过 {\s\}\s1} 但它打印了所有 4 行。

Answer 1

您似乎需要捕获第一行和第三行末尾的数字。

这里有一个方法可以实现：

set s {{ abc1 } 1
{ cde1 } 101
{ fgh1 } 1
{ ijk1 } 2}
set re {^{[^{}]*}\s*(\d+)\s+{[^{}]*}\s*\d+\s+{[^{}]*}\s*(\d+)}
regexp $re $s m g1 g2
set res [expr $g1 + $g2]
puts $res

见IDEONE demo

模式匹配：

^ - 字符串的开头
{[^{}]*} - 类似 {...} 的字符串，内部没有大括号
\s* - 0+ 个空格
(\d+) - 第 1 组 (g1) 捕获 1+ 个数字
\s+ - 1+个空格（如果前后不能有trailing/leading空格，可以用[\r\n]+代替）
{[^{}]*}\s*\d+\s+{[^{}]*}\s*(\d+) - 见上文，仅 (\d+) 将创建第二个变量，g2.

见regex demo

Answer 2

如果不使用正则表达式，这样的问题会更容易解决一个数量级。

package require fileutil

::fileutil::foreachLine line somefile.txt {
    if {[lindex $line end] == 1} {
        puts $line
    }
}

此解决方案查看文件中的每一行并检查最后一项是否等于 1。如果是，则打印该行。

你也可以数一数/求和：

set count 0
set sum 0
::fileutil::foreachLine line somefile.txt {
    if {[lindex $line end] == 1} {
        puts $line
        incr count
        incr sum [lindex $line end] ;# yeah, I know, always 1
    }
}
puts "Number of lines: $count"
puts "Sum of items: $sum"

如果 fileutil 在您的 Tcl 安装中不可用并且您不能或不想安装它，您可以使用较低级别的核心等效项：

set f [open somefile.txt]
while {[gets $f line] >= 0} {
    if {[lindex $line end] == 1} {
        puts $line
    }
}
close $f

如果您绝对必须使用正则表达式，在这种情况下您可以这样做：

::fileutil::foreachLine line somefile.txt {
    if {[regexp {\m1$} $line]} {
        puts $line
    }
}

此正则表达式在一个单词中单独查找以数字 1 结尾的行（即它前面没有数字或单词字符）。

文档：close, fileutil package, gets, if, lindex, open, package, puts, Syntax of Tcl regular expressions, regexp, while

Answer 3

解决方案 1： 如果您不想使用 regexp 并且您的输入行具有与 {string} number

相同的格式

set fd [open "somefile.txt" r]
while {[gets $fd line] >= 0} {
    if {[lindex $line 1] == 1} {
        puts [lindex $line 1] ;# Prints only 1
        puts $line            ;# Prints Whole Line which has 1 at end
    }
}

解决方案 2： 如果您想使用 regexp，则选择 group-capturing 即 (.*)

set fd [open "somefile.txt" r]
while {[gets $fd line] >= 0} {
    if {[regexp "\{.*\} (.*)" $line match match1]} {
        if {$match1 == 1} {
            puts $line
        }
    }
}

解决方案 3： 基于 @Peter 关于 regexp

的建议

set fd [open "somefile.txt" r]
while {[gets $fd line] >= 0} {
    if {[regexp {\d+$} $line match]} {
        if {$match == 1} {
            puts $match ;# Prints only 1
            puts $line  ;# Prints whole line which has 1 at end 
        }
    }
}

Tcl 中的模式匹配

Pattern matching in Tcl

regex

tcl