AWK:模式扫描调试脚本不工作

AWK: Pattern scanning debug script not working

我有以下 table:

cat test.txt 
c_az_1858   2020-01-15  -5.50   Parking Serv        Parking Serv
c_az_1859   2020-01-15  -80.56  Avery Johnson   Avery Johnso    592242
c_az_1860   2020-01-15  100.00  Wayne Alexander Flin    7 Pikarere S    Titahi Bay
c_az_1861   2020-01-15  51.75   Setefano P M    Crew Cuts   Lawns
c_az_1862   2020-01-13  -5.50   Parking Serv        Parking Serv
c_az_1863   2020-01-13  -3.00   Parking Serv        Parking Serv
c_az_1864   2020-01-13  57.50   0520/5200000000/002     Apu Cresent
c_az_1865   2020-01-13  46.00   Becta Ltd   Taylormallon    Lawns
c_az_1866   2020-01-13  28.75   Strata Title Adminis    Crewcut Gard    De Payment
c_az_1867   2020-01-13  19.17   D S & S A Tapp  David Tapp  Weekly Lawn

我正在尝试 运行 针对该文件的一系列搜索模式,以便它打印出调用该行前面的行的搜索模式。搜索模式扫描列 $4。像这样:

Park: c_az_1858 2020-01-15      -5.50   Parking Serv            Parking Serv
ayn : c_az_1860 2020-01-15      100.00  Wayne Alexander Flin    7 Pikarere S    Titahi Bay
o P: c_az_1861  2020-01-15      51.75   Setefano P M    Crew Cuts       Lawns
Park: c_az_1862 2020-01-13      -5.50   Parking Serv            Parking Serv
Park: c_az_1863 2020-01-13      -3.00   Parking Serv            Parking Serv
S A: c_az_1867  2020-01-13      19.17   D S & S A Tapp  David Tapp      Weekly Lawn

为此我编写了如下脚本:

#!/usr/bin/env bash
  
awk '
BEGIN{
        FS = OFS = "\t"
        x="ayn|o P|S A|Park"
}
{
for (i in x) {
        if ( ~ i) {
                print x[i] ": " , i 
        }
}
}
' test.txt

当我 运行 这样做时,我收到以下错误消息:

awk: cmd. line:7: (FILENAME=test.txt FNR=1) fatal: attempt to use scalar `x' as an array

x 是一个标量吗?如何重写它才能工作。非常感谢帮助。

在当前代码中,以下代码将字符串分配给变量 x:

x="ayn|o P|S A|Park"

将这些模式分配给数组可以像这样单独完成:

# assign as array values

x[1]="ayn" ; x[2]="o P" ; x[3]="S A" ; x[4]="Park"

# assign as array indices (no need to assign a value)

x["ayn"] ; x["o P"] ; x["S A"] ; x["Park"]

如果以分隔字符串的形式提供,我们可以使用 split() 函数将值分解为单独的字符串并将它们分配为数组值。

对 OP 的当前代码进行一些更改:

  • 允许将搜索模式从 shell 馈送到 awk 变量中
  • 将搜索模式拆分为单独的数组组件

修改后的代码:

patterns='ayn|o P|S A|Park'

awk -v ptns="${patterns}" '
BEGIN { FS = OFS = "\t"
        split(ptns,arr,"|")           # split ptns into array arr[] based on "|" delimiter
        for (i in arr)
            x[arr[i]]                 # convert arr[] values to x[] indices
      }
      { for (i in x)
            if ( ~ i)               # compare  with the array indices
               print i ": " [=12=]
      }
' test.txt

或者我们可以只使用 split() 的结果并确保我们将 </code> 与数组中的值(而不是数组的索引)匹配,例如:</p> <pre><code>patterns='ayn|o P|S A|Park' awk -v ptns="${patterns}" ' BEGIN { FS = OFS = "\t" split(ptns,arr,"|") # split ptns into array arr[] based on "|" delimiter } { for (i in arr) if ( ~ arr[i]) # compare with the array values print arr[i] ": " [=13=] } ' test.txt

这两个都会生成:

Park: c_az_1858 2020-01-15      -5.50   Parking Serv    Parking Serv
ayn: c_az_1860  2020-01-15      100.00  Wayne Alexander Flin    7 Pikarere S    Titahi Bay
o P: c_az_1861  2020-01-15      51.75   Setefano P M    Crew Cuts       Lawns
Park: c_az_1862 2020-01-13      -5.50   Parking Serv    Parking Serv
Park: c_az_1863 2020-01-13      -3.00   Parking Serv    Parking Serv
S A: c_az_1867  2020-01-13      19.17   D S & S A Tapp  David Tapp      Weekly Lawn

另一个选项可以通过将双引号替换为正斜杠,将竖线分隔的字符串更改为正则表达式,其中模式中的竖线将用于列出替代项。

然后您可以检查第 4 列中的匹配项并打印第一个匹配的部分加上整行。

awk '
BEGIN{FS=OFS="\t"}
match(, /ayn|o P|S A|Park/) { 
  print substr(, RSTART, RLENGTH) ":", [=10=]
}
' test.txt

输出

Park:   c_az_1858       2020-01-15      -5.50   Parking Serv            Parking Serv
ayn:    c_az_1860       2020-01-15      100.00  Wayne Alexander Flin    7 Pikarere S    Titahi Bay
o P:    c_az_1861       2020-01-15      51.75   Setefano P M    Crew Cuts       Lawns
Park:   c_az_1862       2020-01-13      -5.50   Parking Serv            Parking Serv
Park:   c_az_1863       2020-01-13      -3.00   Parking Serv            Parking Serv
S A:    c_az_1867       2020-01-13      19.17   D S & S A Tapp  David Tapp      Weekly Lawn