Unix 将不同的命名值解析为单独的行
Unix Parse Varying Named Value into seperate rows
我们正在获取一个不同长度的输入文件,如下所述。文字长度不一。
输入文件:
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8
此处的文本以命名值对为内容,长度不等。请注意,文本列中的名称可以包含分号。我们正在尝试解析输入,但我们无法通过 AWK 或 BASH
处理它
期望的输出:
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
下面的代码片段适用于 ID=2,但不适用于 ID=1
echo "2|name1=value1;name2=value2;name6=;name7=value7;name8=value8" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;done
cat tmp
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
echo "1|name1=value1;name3;name4=value2;name5=value5" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;sed -i "s/^/${id}\|/g" tmp;done
cat tmp
1|name1=value1
1|name3
1|name4=value2
1|name5=value5
非常感谢任何帮助。
您能否尝试在 GNU awk
中使用新版本的 GNU awk
中显示的示例进行跟踪、编写和测试。由于 OP 的 awk
版本较旧,因此如果有人拥有旧版本的 awk
,请尝试将其更改为 awk --re-interval
awk '
BEGIN{
FS=OFS="|"
}
FNR==1{ next }
{
first=
while(match([=10=],/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
print first,substr([=10=],RSTART,RLENGTH)
[=10=]=substr([=10=],RSTART+RLENGTH)
}
}' Input_file
输出如下。
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
说明: 补充以上详细说明(以下仅作说明)
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS="|" ##Setting FS and OFS wiht | here.
}
FNR==1{ next } ##If line is first line then go next, do not print anything.
{
first= ##Creating first and setting as first field here.
while(match([=12=],/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
##Running while loop which has match which has a regex of matching name and value all mentioned permutations and combinations.
print first,substr([=12=],RSTART,RLENGTH) ##Printing first and sub string(currently matched one)
[=12=]=substr([=12=],RSTART+RLENGTH) ##Saving rest of the line into current line.
}
}' Input_file ##Mentioning Input_file name here.
示例数据:
$ cat name.dat
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8
一个awk
解法:
awk -F"[|;]" ' # use "|" and ";" as input field delimiters
FNR==1 { next } # skip header line
{ pfx= "|" # set output prefix to field 1 + "|"
printpfx=1 # set flag to print prefix
for ( i=2 ; i<=NF ; i++ ) # for fields 2 to NF
{
if ( printpfx) { printf "%s", pfx ; printpfx=0 } # if print flag == 1 then print prefix and clear flag
if ( $(i) ~ /=/ ) { printf "%s\n", $(i) ; printpfx=1 } # if current field contains "=" then print it, end this line of output, reset print flag == 1
if ( $(i) !~ /=/ ) { printf "%s;", $(i) } # if current field does not contain "=" then print it and include a ";" suffix
}
}
' name.dat
以上生成:
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
一个Bash解决方案:
#!/usr/bin/env bash
while IFS=\| read -r id text || [ -n "$id" ]; do
IFS=\; read -r -a kv_arr < <(printf %s "$text")
printf "$id|%s\n" "${kv_arr[@]}"
done < <(tail -n +2 a.txt)
一个简单的POSIXshell解决方案:
#!/usr/bin/env sh
# Chop the header line from the input file
tail -n +2 a.txt |
# While reading id and text Fields Separated by vertical bar
while IFS=\| read -r id text || [ -n "$id" ]; do
# Sets the separator to a semicolon
IFS=\;
# Print each semicolon separated field formatted on
# its own line with the ID
# shellcheck disable=SC2086 # Explicit split on semicolon
printf "$id|%s\n" $text
done
输入a.txt
:
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8
输出:
1|name1=value1
1|name3
1|name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
您有一些很好的答案并且已经被接受了。这是一个更短的 gnu awk 命令,也可以完成这项工作:
awk -F '|' 'NR > 1 {
for (s=; match(s, /([^=]+=[^;]*)(;|$)/, m); s=substr(s, RLENGTH+1))
print FS m[1]
}' file.txt
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
我们正在获取一个不同长度的输入文件,如下所述。文字长度不一。
输入文件:
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8
此处的文本以命名值对为内容,长度不等。请注意,文本列中的名称可以包含分号。我们正在尝试解析输入,但我们无法通过 AWK 或 BASH
处理它期望的输出:
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
下面的代码片段适用于 ID=2,但不适用于 ID=1
echo "2|name1=value1;name2=value2;name6=;name7=value7;name8=value8" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;done
cat tmp
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
echo "1|name1=value1;name3;name4=value2;name5=value5" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;sed -i "s/^/${id}\|/g" tmp;done
cat tmp
1|name1=value1
1|name3
1|name4=value2
1|name5=value5
非常感谢任何帮助。
您能否尝试在 GNU awk
中使用新版本的 GNU awk
中显示的示例进行跟踪、编写和测试。由于 OP 的 awk
版本较旧,因此如果有人拥有旧版本的 awk
,请尝试将其更改为 awk --re-interval
awk '
BEGIN{
FS=OFS="|"
}
FNR==1{ next }
{
first=
while(match([=10=],/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
print first,substr([=10=],RSTART,RLENGTH)
[=10=]=substr([=10=],RSTART+RLENGTH)
}
}' Input_file
输出如下。
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
说明: 补充以上详细说明(以下仅作说明)
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS="|" ##Setting FS and OFS wiht | here.
}
FNR==1{ next } ##If line is first line then go next, do not print anything.
{
first= ##Creating first and setting as first field here.
while(match([=12=],/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
##Running while loop which has match which has a regex of matching name and value all mentioned permutations and combinations.
print first,substr([=12=],RSTART,RLENGTH) ##Printing first and sub string(currently matched one)
[=12=]=substr([=12=],RSTART+RLENGTH) ##Saving rest of the line into current line.
}
}' Input_file ##Mentioning Input_file name here.
示例数据:
$ cat name.dat
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8
一个awk
解法:
awk -F"[|;]" ' # use "|" and ";" as input field delimiters
FNR==1 { next } # skip header line
{ pfx= "|" # set output prefix to field 1 + "|"
printpfx=1 # set flag to print prefix
for ( i=2 ; i<=NF ; i++ ) # for fields 2 to NF
{
if ( printpfx) { printf "%s", pfx ; printpfx=0 } # if print flag == 1 then print prefix and clear flag
if ( $(i) ~ /=/ ) { printf "%s\n", $(i) ; printpfx=1 } # if current field contains "=" then print it, end this line of output, reset print flag == 1
if ( $(i) !~ /=/ ) { printf "%s;", $(i) } # if current field does not contain "=" then print it and include a ";" suffix
}
}
' name.dat
以上生成:
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
一个Bash解决方案:
#!/usr/bin/env bash
while IFS=\| read -r id text || [ -n "$id" ]; do
IFS=\; read -r -a kv_arr < <(printf %s "$text")
printf "$id|%s\n" "${kv_arr[@]}"
done < <(tail -n +2 a.txt)
一个简单的POSIXshell解决方案:
#!/usr/bin/env sh
# Chop the header line from the input file
tail -n +2 a.txt |
# While reading id and text Fields Separated by vertical bar
while IFS=\| read -r id text || [ -n "$id" ]; do
# Sets the separator to a semicolon
IFS=\;
# Print each semicolon separated field formatted on
# its own line with the ID
# shellcheck disable=SC2086 # Explicit split on semicolon
printf "$id|%s\n" $text
done
输入a.txt
:
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8
输出:
1|name1=value1
1|name3
1|name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
您有一些很好的答案并且已经被接受了。这是一个更短的 gnu awk 命令,也可以完成这项工作:
awk -F '|' 'NR > 1 {
for (s=; match(s, /([^=]+=[^;]*)(;|$)/, m); s=substr(s, RLENGTH+1))
print FS m[1]
}' file.txt
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8