匹配特定列的 grep 文件

Question

我只想保留 results.txt 中根据 results.txt 的第 3 列中的匹配项与 uniq.txt 中的 ID 匹配的行。通常我会使用 grep -f uniq.txt results.txt，但这并没有指定第 3 列。

uniq.txt

results.txt

readID  seqID   taxID   score   2ndBestScore    hitLength       queryLength     numMatches
A00260:70:HJM2YDSXX:4:1111:15519:16720  NC_000011.10    9606    169     0       28      151     1
A00260:70:HJM2YDSXX:3:1536:9805:14841   NW_021160017.1  9606    81      0       24      151     1
A00260:70:HJM2YDSXX:3:1366:27181:24330  NC_014803.1     234831  121     121     26      151     3
A00260:70:HJM2YDSXX:3:1366:27181:24330  NC_014973.1     443143  121     121     26      151     3

Answer 1

使用您展示的示例，请尝试以下代码。

awk 'FNR==NR{arr[[=10=]];next} ( in arr)' uniq.txt results.txt

解释：

awk '                     ##Starting awk program from here.
FNR==NR{                  ##Checking condition which will be TRUE when uniq.txt is being read.
  arr[[=11=]]                 ##Creating arrar with index of current line.
  next                    ##next will skip all further statements from here.
}
( in arr)               ##If 3rd field is present in arr then print line from results.txt here.
' uniq.txt results.txt    ##Mentioning Input_file names here.

第二个解决方案：如果你的字段编号没有设置在results.txt中，你想搜索整个值行然后尝试以下。

awk 'FNR==NR{arr[[=12=]];next} {for(key in arr){if(index([=12=],key)){print;next}}}' uniq.txt results.txt

Answer 2

您可以结合使用 grep 和 sed 来操纵输入模式并实现您想要的效果

grep -Ef <(sed -e 's/^/^(\S+\s+){2}/;s/$/\s*/' uniq.txt) result.txt

如果要匹配第n列，请将上面命令中的2替换为n-1

产出

A00260:70:HJM2YDSXX:4:1111:15519:16720  NC_000011.10    9606    169     0       28      151     1
A00260:70:HJM2YDSXX:3:1536:9805:14841   NW_021160017.1  9606    81      0       24      151     1
A00260:70:HJM2YDSXX:3:1366:27181:24330  NC_014803.1     234831  121     121     26      151     3

匹配特定列的 grep 文件

grep file matching specific column

awk

grep