如何通过awk解析csv文件?
How to parse through a csv file by awk?
实际上我有一个 csv 文件,假设有 20 headers,它们在特定记录的下一行中有对应的 headers 值。
示例:源文件
Age,Name,Salary
25,Anand,32000
我希望我的输出文件采用这种格式。
示例:输出文件
Age
25
Name
Anand
Salary
32000
那么要使用哪个 awk/grep/sed 命令呢?
我会说
awk -F, 'NR == 1 { split([=10=], headers); next } { for(i = 1; i <= NF; ++i) { print headers[i]; print $i } }' filename
也就是
NR == 1 { # in the first line
split([=11=], headers) # remember the headers
next # do nothing else
}
{ # after that:
for(i = 1; i <= NF; ++i) { # for all fields:
print headers[i] # print the corresponding header
print $i # followed by the field
}
}
附录: 强制性、疯狂的 sed 解决方案(不推荐用于生产用途;为娱乐而不是盈利而写):
sed 's/$/,/; 1 { h; d; }; G; :a s/\([^,]*\),\([^\n]*\n\)\([^,]*\),\(.*\)/\n\n/; ta; s/^\n\n//' filename
工作原理如下:
s/$/,/ # Add a comma to all lines for more convenient processing
1 { h; d; } # first line: Just put it in the hold buffer
G # all other lines: Append hold bufffer (header fields) to the
# pattern space
:a # jump label for looping
# isolate the first fields from the data and header lines,
# move them to the end of the pattern space
s/\([^,]*\),\([^\n]*\n\)\([^,]*\),\(.*\)/\n\n/
ta # do this until we got them all
s/^\n\n// # then remove the two newlines that are left as an artifact of
# the algorithm.
这是一个awk
awk -F, 'NR==1{for (i=1;i<=NF;i++) a[i]=$i;next} {for (i=1;i<=NF;i++) print a[i] RS $i}' file
Age
25
Name
Anand
Salary
32000
首先for
循环将header存储在数组a
中
第二个 for
循环从数组 a
打印 header 和相应的数据。
对二维数组使用 GNU awk 4.*:
$ awk -F, '{a[NR][1];split([=10=],a[NR])} END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) print a[j][i]}' file
Age
25
Name
Anand
Salary
32000
一般要转置行和列:
$ cat file
11 12 13
21 22 23
31 32 33
41 42 43
使用 GNU awk:
$ awk '{a[NR][1];split([=12=],a[NR])} END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s", a[j][i], (j<NR?OFS:ORS)}' file
11 21 31 41
12 22 32 42
13 23 33 43
或使用任何 awk:
$ awk '{for (i=1;i<=NF;i++) a[NR][i]=$i} END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s", a[j][i], (j<NR?OFS:ORS)}' file
11 21 31 41
12 22 32 42
13 23 33 43
实际上我有一个 csv 文件,假设有 20 headers,它们在特定记录的下一行中有对应的 headers 值。 示例:源文件
Age,Name,Salary
25,Anand,32000
我希望我的输出文件采用这种格式。 示例:输出文件
Age
25
Name
Anand
Salary
32000
那么要使用哪个 awk/grep/sed 命令呢?
我会说
awk -F, 'NR == 1 { split([=10=], headers); next } { for(i = 1; i <= NF; ++i) { print headers[i]; print $i } }' filename
也就是
NR == 1 { # in the first line
split([=11=], headers) # remember the headers
next # do nothing else
}
{ # after that:
for(i = 1; i <= NF; ++i) { # for all fields:
print headers[i] # print the corresponding header
print $i # followed by the field
}
}
附录: 强制性、疯狂的 sed 解决方案(不推荐用于生产用途;为娱乐而不是盈利而写):
sed 's/$/,/; 1 { h; d; }; G; :a s/\([^,]*\),\([^\n]*\n\)\([^,]*\),\(.*\)/\n\n/; ta; s/^\n\n//' filename
工作原理如下:
s/$/,/ # Add a comma to all lines for more convenient processing
1 { h; d; } # first line: Just put it in the hold buffer
G # all other lines: Append hold bufffer (header fields) to the
# pattern space
:a # jump label for looping
# isolate the first fields from the data and header lines,
# move them to the end of the pattern space
s/\([^,]*\),\([^\n]*\n\)\([^,]*\),\(.*\)/\n\n/
ta # do this until we got them all
s/^\n\n// # then remove the two newlines that are left as an artifact of
# the algorithm.
这是一个awk
awk -F, 'NR==1{for (i=1;i<=NF;i++) a[i]=$i;next} {for (i=1;i<=NF;i++) print a[i] RS $i}' file
Age
25
Name
Anand
Salary
32000
首先for
循环将header存储在数组a
中
第二个 for
循环从数组 a
打印 header 和相应的数据。
对二维数组使用 GNU awk 4.*:
$ awk -F, '{a[NR][1];split([=10=],a[NR])} END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) print a[j][i]}' file
Age
25
Name
Anand
Salary
32000
一般要转置行和列:
$ cat file
11 12 13
21 22 23
31 32 33
41 42 43
使用 GNU awk:
$ awk '{a[NR][1];split([=12=],a[NR])} END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s", a[j][i], (j<NR?OFS:ORS)}' file
11 21 31 41
12 22 32 42
13 23 33 43
或使用任何 awk:
$ awk '{for (i=1;i<=NF;i++) a[NR][i]=$i} END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s", a[j][i], (j<NR?OFS:ORS)}' file
11 21 31 41
12 22 32 42
13 23 33 43