删除 bash 中文本文件中第 10 列包含空字段的行

Question

我有一个巨大的包含 10 列的制表符分隔文本文件。现在我想从文件中删除第 10 列中不包含任何值的所有行。

例如：

a b c d e f g h i j
4 6 8 9 4 2 1 6 4 2
1 5 9 8 5 1 8 3 6 
1 6 8 5 4 7 7 9 4 7
4 5 8 9 9 2 1 8 4 
3 4 7 5 8 8 2 5 3 6

预期输出：

a b c d e f g h i j
4 6 8 9 4 2 1 6 4 2
1 6 8 5 4 7 7 9 4 7
3 4 7 5 8 8 2 5 3 6

我想使用类似的东西：

awk ' == ""' print [=13=] file

Answer 1

您可以使用

打印第 10 列不为空的每一行

awk '{if () print}' file.txt

$ cat file.txt
a b c d e f g h i j
4 6 8 9 4 2 1 6 4 2
1 5 9 8 5 1 8 3 6
1 6 8 5 4 7 7 9 4 7
4 5 8 9 9 2 1 8 4
3 4 7 5 8 8 2 5 3 6
$
$
$ awk '{if () print}' file.txt
a b c d e f g h i j
4 6 8 9 4 2 1 6 4 2
1 6 8 5 4 7 7 9 4 7
3 4 7 5 8 8 2 5 3 6
$

Answer 2

你的命令就快完成了。你可以试试这个：

awk ' != "" {print}' file

!= "" 这将测试第 10 个字段是否为空
print 打印整行

Answer 3

我用“简单”的方式管理它 grep:

grep $'.\t.\t.\t.\t.\t.\t.\t.\t.\t.' file.txt

'.'代表任意字符，\t代表TAB字符。

Answer 4

来点简单点的怎么样（如果你想做 $1=$1）？

 mawk  'NF*=9<NF'
                  or
 mawk 'NF*=10==NF'

或甚至比这更简单（如果你不关心 $1=$1）

 mawk NF==10      # shell-quoting optional for this one
                  or
 mawk '9<NF'

a b c d e f g h i j
4 6 8 9 4 2 1 6 4 2
1 6 8 5 4 7 7 9 4 7
3 4 7 5 8 8 2 5 3 6

甚至完全 counter-intuitive 但完全 posix-compliant 形式 :

mawk '+RS==NF%10'

不要花时间手动检查，因为 up-front 字段拆分已经代表您完成了

删除 bash 中文本文件中第 10 列包含空字段的行

Remove lines where the 10th column contains empty fields in text file in bash

bash