使用 shell 脚本查找 CSV 文件中列的最小和最大长度
Finding the minimum and maximum length of columns in a CSV file using shell script
我有几个包含多列的 CSV 文件,我想获取单个列的最大长度、最小长度以及同一 CSV 文件中每列的差异 (max -min)。示例:
文件:
abc 1234 4
bcd 23644 534
c 3232 6
预期输出:
abc 1234 4
bcd 23644 534
c 3232 6
Max Length 3 5 3
Min Length 1 4 1
Diff 2 1 2
以下用于计算 MAX 列长度的脚本正在生成预期的输出:
awk -F, '
{ for (i=1;i<=NF;i++)l[i]=((x=length($i))>l[i]?x:l[i])}
END {for(i=1;i<=NF;i++) print "Column"i":",l[i]} '
但是最小长度脚本有问题:
awk -F"," 'BEGIN {
for (i=1;i<=NF;i++) {
cur = length($i)
if ( (min == 0) || (cur < min) ) {
minlength = i
min = cur
}
} ;
for (i=1;i<=NF;i++) print $minlength}'
如有任何帮助,我们将不胜感激。
您只需要根据文件的第一行设置最小和最大数组的起始值:
awk '
NR==1 {for (i=1; i<=NF; i++) maxlen[i] = minlen[i] = length($i)}
{
for (i=1; i<=NF; i++) {
len = length($i)
if (len > maxlen[i]) maxlen[i] = len
if (len < minlen[i]) minlen[i] = len
}
}
END {
printf "Max Length"
for (i=1; i<=NF; i++) printf " %d", maxlen[i]
print ""
printf "Min Length"
for (i=1; i<=NF; i++) printf " %d", minlen[i]
print ""
printf "Diff"
for (i=1; i<=NF; i++) printf " %d", maxlen[i]-minlen[i]
print ""
}
' file
我有几个包含多列的 CSV 文件,我想获取单个列的最大长度、最小长度以及同一 CSV 文件中每列的差异 (max -min)。示例:
文件:
abc 1234 4
bcd 23644 534
c 3232 6
预期输出:
abc 1234 4
bcd 23644 534
c 3232 6
Max Length 3 5 3
Min Length 1 4 1
Diff 2 1 2
以下用于计算 MAX 列长度的脚本正在生成预期的输出:
awk -F, '
{ for (i=1;i<=NF;i++)l[i]=((x=length($i))>l[i]?x:l[i])}
END {for(i=1;i<=NF;i++) print "Column"i":",l[i]} '
但是最小长度脚本有问题:
awk -F"," 'BEGIN {
for (i=1;i<=NF;i++) {
cur = length($i)
if ( (min == 0) || (cur < min) ) {
minlength = i
min = cur
}
} ;
for (i=1;i<=NF;i++) print $minlength}'
如有任何帮助,我们将不胜感激。
您只需要根据文件的第一行设置最小和最大数组的起始值:
awk '
NR==1 {for (i=1; i<=NF; i++) maxlen[i] = minlen[i] = length($i)}
{
for (i=1; i<=NF; i++) {
len = length($i)
if (len > maxlen[i]) maxlen[i] = len
if (len < minlen[i]) minlen[i] = len
}
}
END {
printf "Max Length"
for (i=1; i<=NF; i++) printf " %d", maxlen[i]
print ""
printf "Min Length"
for (i=1; i<=NF; i++) printf " %d", minlen[i]
print ""
printf "Diff"
for (i=1; i<=NF; i++) printf " %d", maxlen[i]-minlen[i]
print ""
}
' file