Unix 将不同的命名值解析为单独的行

Unix Parse Varying Named Value into seperate rows

我们正在获取一个不同长度的输入文件,如下所述。文字长度不一。

输入文件:

ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

此处的文本以命名值对为内容,长度不等。请注意,文本列中的名称可以包含分号。我们正在尝试解析输入,但我们无法通过 AWK 或 BASH

处理它

期望的输出:

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

下面的代码片段适用于 ID=2,但不适用于 ID=1

echo "2|name1=value1;name2=value2;name6=;name7=value7;name8=value8" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;done
cat tmp
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
echo "1|name1=value1;name3;name4=value2;name5=value5" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;sed -i "s/^/${id}\|/g" tmp;done
cat tmp
1|name1=value1
1|name3
1|name4=value2
1|name5=value5

非常感谢任何帮助。

您能否尝试在 GNU awk 中使用新版本的 GNU awk 中显示的示例进行跟踪、编写和测试。由于 OP 的 awk 版本较旧,因此如果有人拥有旧版本的 awk,请尝试将其更改为 awk --re-interval

awk '
BEGIN{
  FS=OFS="|"
}
FNR==1{ next }
{
  first=
  while(match([=10=],/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
    print first,substr([=10=],RSTART,RLENGTH)
    [=10=]=substr([=10=],RSTART+RLENGTH)
  }
}'  Input_file

输出如下。

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

说明: 补充以上详细说明(以下仅作说明)

awk '                                        ##Starting awk program from here.
BEGIN{                                       ##Starting BEGIN section from here.
  FS=OFS="|"                                 ##Setting FS and OFS wiht | here.
}
FNR==1{ next }                               ##If line is first line then go next, do not print anything.
{
  first=                                   ##Creating first and setting as first field here.
  while(match([=12=],/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
##Running while loop which has match which has a regex of matching name and value all mentioned permutations and combinations.
    print first,substr([=12=],RSTART,RLENGTH)    ##Printing first and sub string(currently matched one)
    [=12=]=substr([=12=],RSTART+RLENGTH)             ##Saving rest of the line into current line.
  }
}' Input_file                                ##Mentioning Input_file name here.

示例数据:

$ cat name.dat
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

一个awk解法:

awk -F"[|;]" '                                                           # use "|" and ";" as input field delimiters
FNR==1 { next }                                                          # skip header line
       { pfx= "|"                                                      # set output prefix to field 1 + "|"
         printpfx=1                                                      # set flag to print prefix

         for ( i=2 ; i<=NF ; i++ )                                       # for fields 2 to NF
             {
               if ( printpfx)     { printf "%s",   pfx  ; printpfx=0 }   # if print flag == 1 then print prefix and clear flag
               if ( $(i)  ~ /=/ ) { printf "%s\n", $(i) ; printpfx=1 }   # if current field contains "=" then print it, end this line of output, reset print flag == 1
               if ( $(i) !~ /=/ ) { printf "%s;",  $(i) }                # if current field does not contain "=" then print it and include a ";" suffix
             }
       }
' name.dat

以上生成:

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

一个Bash解决方案:

#!/usr/bin/env bash

while IFS=\| read -r id text || [ -n "$id" ]; do
  IFS=\; read -r -a kv_arr < <(printf %s "$text")
  printf "$id|%s\n" "${kv_arr[@]}"
done < <(tail -n +2 a.txt)

一个简单的POSIXshell解决方案:

#!/usr/bin/env sh

# Chop the header line from the input file
tail -n +2 a.txt |
# While reading id and text Fields Separated by vertical bar
while IFS=\| read -r id text || [ -n "$id" ]; do
  # Sets the separator to a semicolon
  IFS=\;
  # Print each semicolon separated field formatted on
  # its own line with the ID
  # shellcheck disable=SC2086 # Explicit split on semicolon
  printf "$id|%s\n" $text
done

输入a.txt:

ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

输出:

1|name1=value1
1|name3
1|name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

您有一些很好的答案并且已经被接受了。这是一个更短的 gnu awk 命令,也可以完成这项工作:

awk -F '|' 'NR > 1 {
   for (s=; match(s, /([^=]+=[^;]*)(;|$)/, m); s=substr(s, RLENGTH+1))
      print  FS m[1]      
}' file.txt
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8