如何删除全为零的字段
How to remove fields with all zeros
我有一个 file
看起来像这样 :
header,d0,d1,d2,d3, ...
s1,0,5,2,8, ...
s2,0,8,2,4, ...
s3,0,7,3,4, ...
s4,0,3,2,1, ...
...
我想删除所有全为零的列,如 d0
我可以手动检查全为零的列并找到 d0 并执行
cut -d "," -f 1,3- file> file_revised
期望的输出是
header,d1,d2,d3, ...
s1,5,2,8, ...
s2,8,2,4, ...
s3,7,3,4, ...
s4,3,2,1, ...
...
但是因为我的列太多了,很难手动检查。
如何自动删除全为零的列?
谢谢。
$ cat file
header,d0,d1,d2,d3
s1,0,5,2,8
s2,0,8,2,4
s3,0,7,3,4
s4,0,3,2,1
$
$ cat tst.awk
NR==1 {
for (i=1; i<=NF; ++i)
a[i]
next
}
NR==FNR {
for (i in a)
if ($i != "0")
delete a[i]
next
}
{
sep = ""
out = ""
for (i=1; i<=NF; ++i) {
if (i in a)
continue
out = out sep $i
sep = FS
}
print out
}
$
$ awk -F, -f tst.awk file file
header,d1,d2,d3
s1,5,2,8
s2,8,2,4
s3,7,3,4
s4,3,2,1
假设第一列不包含全零,这个 awk 脚本应该可以完成这项工作
awk -F',' '(NR==FNR && NR >1){for(i = 1; i <= NF; i++)
{a[i] = a[i]+$i}}
(FNR!=NR){out=
for(i = 2; i<= NF; i++){
if(a[i]!=0){out=out","$i}
}
print out
}' file_name file_name
请注意,脚本使用输入文件的名称 file_name 两次!
例如,对于输入:
header,d0,d
s1,0,5,2,8,
s2,0,8,2,4,
s3,0,7,3,4,
s4,0,3,2,1,
脚本生成输出
header,d
s1,5,2,8
s2,8,2,4
s3,7,3,4
s4,3,2,1
也许您可以使用如下 sed
命令:
$ sed 's/\b0\,\b//g' test.txt
header,d0,d1,d2,d3
s1,5,2,8
s2,8,2,4
s3,7,3,4
s4,3,2,1
这是一个收集字段打印到变量(p=","
...等)并使用 system
调用 awk 到 print p
的方法:
$ awk '
BEGIN { FS=OFS="," }
NR==1 {
for(i=1;i<=NF;i++) # gather all field numbers to c[]
c[i]
next }
{
for(i in c) # test all fields that still are all zeros
if($i!=0)
delete c[i] }
END { # after testing all the records
for(i=1;i<=NF;i++)
if(!(i in c))
p=p (p==""?"":OFS) "$" i # make list of list of fields to print
p="print " p # p="print ,,,,"
system("awk 7BEGIN{FS=OFS=\",\"}{" cmd "}7 " FILENAME)
}' file
输出:
header,d1,d2,d3, ...
s1,5,2,8, ...
s2,8,2,4, ...
s3,7,3,4, ...
s4,3,2,1, ...
如果所有字段都为零,p="print"
并打印整个文件。
使用 Perl
> cat sumin.txt
header,d0,d1,d2,d3
s1,0,5,2,8
s2,0,8,2,4
s3,0,7,3,4
s4,0,3,2,1
> cat rem_zero.sh
perl -F, -lane '
@FH=@F if $.==1;
if($.>1)
{
$F[$_] and $nz[$_]||=1 for 0..$#F;
push(@L,[@F]);
}
END {
@cols = grep $nz[$_], 0..$#nz;
print join(",",@FH[@cols]);
for my $line (@L) { print "@{$line}[@cols]" }
}
'
> rem_zero.sh sumin.txt
header,d1,d2,d3
s1 5 2 8
s2 8 2 4
s3 7 3 4
s4 3 2 1
>
我有一个 file
看起来像这样 :
header,d0,d1,d2,d3, ...
s1,0,5,2,8, ...
s2,0,8,2,4, ...
s3,0,7,3,4, ...
s4,0,3,2,1, ...
...
我想删除所有全为零的列,如 d0
我可以手动检查全为零的列并找到 d0 并执行
cut -d "," -f 1,3- file> file_revised
期望的输出是
header,d1,d2,d3, ...
s1,5,2,8, ...
s2,8,2,4, ...
s3,7,3,4, ...
s4,3,2,1, ...
...
但是因为我的列太多了,很难手动检查。
如何自动删除全为零的列?
谢谢。
$ cat file
header,d0,d1,d2,d3
s1,0,5,2,8
s2,0,8,2,4
s3,0,7,3,4
s4,0,3,2,1
$
$ cat tst.awk
NR==1 {
for (i=1; i<=NF; ++i)
a[i]
next
}
NR==FNR {
for (i in a)
if ($i != "0")
delete a[i]
next
}
{
sep = ""
out = ""
for (i=1; i<=NF; ++i) {
if (i in a)
continue
out = out sep $i
sep = FS
}
print out
}
$
$ awk -F, -f tst.awk file file
header,d1,d2,d3
s1,5,2,8
s2,8,2,4
s3,7,3,4
s4,3,2,1
假设第一列不包含全零,这个 awk 脚本应该可以完成这项工作
awk -F',' '(NR==FNR && NR >1){for(i = 1; i <= NF; i++)
{a[i] = a[i]+$i}}
(FNR!=NR){out=
for(i = 2; i<= NF; i++){
if(a[i]!=0){out=out","$i}
}
print out
}' file_name file_name
请注意,脚本使用输入文件的名称 file_name 两次!
例如,对于输入:
header,d0,d
s1,0,5,2,8,
s2,0,8,2,4,
s3,0,7,3,4,
s4,0,3,2,1,
脚本生成输出
header,d
s1,5,2,8
s2,8,2,4
s3,7,3,4
s4,3,2,1
也许您可以使用如下 sed
命令:
$ sed 's/\b0\,\b//g' test.txt
header,d0,d1,d2,d3
s1,5,2,8
s2,8,2,4
s3,7,3,4
s4,3,2,1
这是一个收集字段打印到变量(p=","
...等)并使用 system
调用 awk 到 print p
的方法:
$ awk '
BEGIN { FS=OFS="," }
NR==1 {
for(i=1;i<=NF;i++) # gather all field numbers to c[]
c[i]
next }
{
for(i in c) # test all fields that still are all zeros
if($i!=0)
delete c[i] }
END { # after testing all the records
for(i=1;i<=NF;i++)
if(!(i in c))
p=p (p==""?"":OFS) "$" i # make list of list of fields to print
p="print " p # p="print ,,,,"
system("awk 7BEGIN{FS=OFS=\",\"}{" cmd "}7 " FILENAME)
}' file
输出:
header,d1,d2,d3, ...
s1,5,2,8, ...
s2,8,2,4, ...
s3,7,3,4, ...
s4,3,2,1, ...
如果所有字段都为零,p="print"
并打印整个文件。
使用 Perl
> cat sumin.txt
header,d0,d1,d2,d3
s1,0,5,2,8
s2,0,8,2,4
s3,0,7,3,4
s4,0,3,2,1
> cat rem_zero.sh
perl -F, -lane '
@FH=@F if $.==1;
if($.>1)
{
$F[$_] and $nz[$_]||=1 for 0..$#F;
push(@L,[@F]);
}
END {
@cols = grep $nz[$_], 0..$#nz;
print join(",",@FH[@cols]);
for my $line (@L) { print "@{$line}[@cols]" }
}
'
> rem_zero.sh sumin.txt
header,d1,d2,d3
s1 5 2 8
s2 8 2 4
s3 7 3 4
s4 3 2 1
>