打印两个文件和行之间的匹配项,同时按参考文件排序
Print matches between two files and line after, while ordering by reference file
我有一个参考文件
NP_001041718.1
XP_021405980.1
NP_001041719.1
XP_021385112.1
NP_001041721.1
XP_021394530.1
NP_001041722.1
XP_021394327.1
NP_001041723.1
XP_021400667.1
我需要在如下所示的目标文件中捕获匹配项和下一行,并保持参考文件中的顺序
NP_001041718.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVDPENEAYEMPPEEEYQDYEPEA
NP_001041719.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
NP_001041721.1
TMESGAENQQSGDAAGTEAETQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETAAPAIATVTVPTPIYQTSSGQYIAITQGGAIQLSNNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
NP_001041722.1
RVNESELNSSVLPRDPPAEGAPRRQPWVTSTLAAILIFTIAVDLLGNLLVILSVYRNKKLRNAGNVFVVSLAVADLIVAIYPYPLVLTSVFHNGWKLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLKYDKLYSDRNSLCYIVLIWLLTFVAIVPNLFVGSLQYDPRIYSCTFAQSVSSAYTIAVVFFHFLLPIAVVTFCYLRIWILVIQVRRRVKPDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGIAVAVNPKTVIPRIPEWLFVSSYYMAYFNSCLNAIVYGLLNQNFRREYKRIIVNFCTAKVFFQDSSNDAGDRMRSKPSPLITNNNQVKVDSV
NP_001041723.1
LENGSLRNCCDPGGRGRLGLAEREAAAAGAPRPAWVVPVLSSVLIFTTVVDILGNLLVILSVFKNRKLRNSGNAFVVSLALADLVVALYPYPLVLLAIFHNGWTLGETHCKASGFVMGLSVIGSIFNITAIAINRYCYICHSFAYDKVYSCWNTMLYVSLVWILTVIATVPNFFVGSLKYDPRIYSCTFVQTASSYYTIAVVVIHFIVPITIVSFCYLRIWVLVLQVRRRVKSETKPRLKPSDFRNFLTMFVVFVIFAFCWAPLNFIGLAVAIDPTEMAPKVPEWLFIISYLMAYFNSCLNAIIYGLLNQNFRNEYKRISMSLWMPRLFFQDTSKGGTDGQKSKPSPALNNNNQMKTETL
XP_021405980.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVGPENEAYKMPPEEEYQDYEPEA
XP_021385112.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
XP_021394530.1
TMESGAENQQSGDAAGTEAETQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETAAPAIATVTVPTPIYQTSSGQYIAITQGGAIQLSNNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
XP_021394327.1
RVNESELNSSVLPRDPPAEGAPRRQPWVTSTLAAILIFTIAVDLLGNLLVILSVYRNKKLRNAGNVFVVSLAVADLIVAIYPYPLVLTSVFHNGWKLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLKYDKLYSDRNSLCYIVLIWLLTFVAIVPNLFVGSLQYDPRIYSCTFAQSVSSAYTIAVVFFHFLLPIAVVTFCYLRIWILVIQVRRRVKPDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGIAVAVNPKTVIPRIPEWLFVSSYYMAYFNSCLNAIVYGLLNQNFRREYKRIIVNFCTAKVFFQDSSNDAGDRMRSKPSPLITNNNQVKVDSV
XP_021400667.1
LENGSLRNCCDPGGRGRLGLAEREAAAAGAPRPAWVVPVLSSVLIFTTVVDILGNLLVILSVFKNRKLRNSGNAFVVSLALADLVVALYPYPLVLLAIFHNGWTLGETHCKASGFVMGLSVIGSIFNITAIAINRYCYICHSFAYDKVYSCWNTMLYVSLVWILTVIATVPNFFVGSLKYDPRIYSCTFVQTASSYYTIAVVVIHFIVPITIVSFCYLRIWVLVLQVRRRVKSETKPRLKPSDFRNFLTMFVVFVIFAFCWAPLNFIGLAVAIDPTEMAPKVPEWLFIISYLMAYFNSCLNAIIYGLLNQNFRNEYKRILMSLWMPRLFFQDTSKGGTDGQKSKPSPALNNNNQMKTETI
所以输出看起来像
NP_001041718.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVDPENEAYEMPPEEEYQDYEPEA
XP_021405980.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVGPENEAYKMPPEEEYQDYEPEA
NP_001041719.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
XP_021385112.1
....
我知道如何在保持来自 ref awk 'FNR==NR {a[]=[=13=]; next}; in a {getline} {print a[]}' target ref
的顺序的同时找到目标中的匹配项,但我不知道如何打印后面的行。我知道如何打印 grep -A 1 -f ref target
之后的行,但它会重新排序目标文件
您可以颠倒传递给 awk 的文件的顺序,首先传递 ref 文件,然后创建一个数字递增的数组以保持键和值的顺序。
您可以不使用 getline,而是将最后一行保存在变量中,检查当前行是否存在于以第一个文件的值作为键存储的数组中。
如果是,则将最后一行加上当前行存储在一个新数组中 final
,并在 END 块中循环该数组。
awk '{
if (FNR==NR) {
a[]=i++; next
}
if (last in a) {
final[a[last]] = last RS
}
last =
}
END { for (i in final) print final[i] }
' ref target
输出
NP_001041718.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVDPENEAYEMPPEEEYQDYEPEA
XP_021405980.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVGPENEAYKMPPEEEYQDYEPEA
NP_001041719.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
XP_021385112.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
NP_001041721.1
TMESGAENQQSGDAAGTEAETQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETAAPAIATVTVPTPIYQTSSGQYIAITQGGAIQLSNNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
XP_021394530.1
TMESGAENQQSGDAAGTEAETQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETAAPAIATVTVPTPIYQTSSGQYIAITQGGAIQLSNNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
NP_001041722.1
RVNESELNSSVLPRDPPAEGAPRRQPWVTSTLAAILIFTIAVDLLGNLLVILSVYRNKKLRNAGNVFVVSLAVADLIVAIYPYPLVLTSVFHNGWKLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLKYDKLYSDRNSLCYIVLIWLLTFVAIVPNLFVGSLQYDPRIYSCTFAQSVSSAYTIAVVFFHFLLPIAVVTFCYLRIWILVIQVRRRVKPDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGIAVAVNPKTVIPRIPEWLFVSSYYMAYFNSCLNAIVYGLLNQNFRREYKRIIVNFCTAKVFFQDSSNDAGDRMRSKPSPLITNNNQVKVDSV
XP_021394327.1
RVNESELNSSVLPRDPPAEGAPRRQPWVTSTLAAILIFTIAVDLLGNLLVILSVYRNKKLRNAGNVFVVSLAVADLIVAIYPYPLVLTSVFHNGWKLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLKYDKLYSDRNSLCYIVLIWLLTFVAIVPNLFVGSLQYDPRIYSCTFAQSVSSAYTIAVVFFHFLLPIAVVTFCYLRIWILVIQVRRRVKPDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGIAVAVNPKTVIPRIPEWLFVSSYYMAYFNSCLNAIVYGLLNQNFRREYKRIIVNFCTAKVFFQDSSNDAGDRMRSKPSPLITNNNQVKVDSV
NP_001041723.1
LENGSLRNCCDPGGRGRLGLAEREAAAAGAPRPAWVVPVLSSVLIFTTVVDILGNLLVILSVFKNRKLRNSGNAFVVSLALADLVVALYPYPLVLLAIFHNGWTLGETHCKASGFVMGLSVIGSIFNITAIAINRYCYICHSFAYDKVYSCWNTMLYVSLVWILTVIATVPNFFVGSLKYDPRIYSCTFVQTASSYYTIAVVVIHFIVPITIVSFCYLRIWVLVLQVRRRVKSETKPRLKPSDFRNFLTMFVVFVIFAFCWAPLNFIGLAVAIDPTEMAPKVPEWLFIISYLMAYFNSCLNAIIYGLLNQNFRNEYKRISMSLWMPRLFFQDTSKGGTDGQKSKPSPALNNNNQMKTETL
XP_021400667.1
LENGSLRNCCDPGGRGRLGLAEREAAAAGAPRPAWVVPVLSSVLIFTTVVDILGNLLVILSVFKNRKLRNSGNAFVVSLALADLVVALYPYPLVLLAIFHNGWTLGETHCKASGFVMGLSVIGSIFNITAIAINRYCYICHSFAYDKVYSCWNTMLYVSLVWILTVIATVPNFFVGSLKYDPRIYSCTFVQTASSYYTIAVVVIHFIVPITIVSFCYLRIWVLVLQVRRRVKSETKPRLKPSDFRNFLTMFVVFVIFAFCWAPLNFIGLAVAIDPTEMAPKVPEWLFIISYLMAYFNSCLNAIIYGLLNQNFRNEYKRILMSLWMPRLFFQDTSKGGTDGQKSKPSPALNNNNQMKTETI
使用 getline into a variable 的变体:
awk '{
if (FNR==NR) {
i++;a[i]=;b[]=i;next
}
if ( in b && (getline tmp) > 0) {
final[b[]] = a[b[]] RS tmp
}
}
END { for (i in final) print final[i] }
' ref target
请您尝试以下操作:
awk 'FNR==NR { # process "target" file
if (FNR%2) a[key=]=[=10=] # store odd lines in array a
else b[key]=[=10=] # store even lines in array b using the same key as the previous line
next
}
in a {print a[]; print b[]} # if the key matches, print the odd line and the even line
' target ref
使用您显示的示例,请尝试以下 awk
代码。在 GNU awk
.
中编写和测试
awk '
FNR==NR{
if(FNR%2==0){
arr[prev]=[=10=]
}
else{
prev=[=10=]
}
next
}
([=10=] in arr){
print [=10=] ORS arr[[=10=]]
}
' target ref
解释:为以上添加详细解释。
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when target file is being read.
if(FNR%2==0){ ##Checking condition if current line is getting divided completely by 0 then do following.
arr[prev]=[=11=] ##Creating arr with index of prev and value is current line.
}
else{ ##Else do following.
prev=[=11=] ##Setting prev to current line.
}
next ##next will skip all further statements from here.
}
([=11=] in arr){ ##If current line is present in arr then do following.
print [=11=] ORS arr[[=11=]] ##Printing current line ORS and arr with index of [=11=].
}
' target ref ##Mentioning Input_file names here.
我有一个参考文件
NP_001041718.1
XP_021405980.1
NP_001041719.1
XP_021385112.1
NP_001041721.1
XP_021394530.1
NP_001041722.1
XP_021394327.1
NP_001041723.1
XP_021400667.1
我需要在如下所示的目标文件中捕获匹配项和下一行,并保持参考文件中的顺序
NP_001041718.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVDPENEAYEMPPEEEYQDYEPEA
NP_001041719.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
NP_001041721.1
TMESGAENQQSGDAAGTEAETQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETAAPAIATVTVPTPIYQTSSGQYIAITQGGAIQLSNNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
NP_001041722.1
RVNESELNSSVLPRDPPAEGAPRRQPWVTSTLAAILIFTIAVDLLGNLLVILSVYRNKKLRNAGNVFVVSLAVADLIVAIYPYPLVLTSVFHNGWKLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLKYDKLYSDRNSLCYIVLIWLLTFVAIVPNLFVGSLQYDPRIYSCTFAQSVSSAYTIAVVFFHFLLPIAVVTFCYLRIWILVIQVRRRVKPDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGIAVAVNPKTVIPRIPEWLFVSSYYMAYFNSCLNAIVYGLLNQNFRREYKRIIVNFCTAKVFFQDSSNDAGDRMRSKPSPLITNNNQVKVDSV
NP_001041723.1
LENGSLRNCCDPGGRGRLGLAEREAAAAGAPRPAWVVPVLSSVLIFTTVVDILGNLLVILSVFKNRKLRNSGNAFVVSLALADLVVALYPYPLVLLAIFHNGWTLGETHCKASGFVMGLSVIGSIFNITAIAINRYCYICHSFAYDKVYSCWNTMLYVSLVWILTVIATVPNFFVGSLKYDPRIYSCTFVQTASSYYTIAVVVIHFIVPITIVSFCYLRIWVLVLQVRRRVKSETKPRLKPSDFRNFLTMFVVFVIFAFCWAPLNFIGLAVAIDPTEMAPKVPEWLFIISYLMAYFNSCLNAIIYGLLNQNFRNEYKRISMSLWMPRLFFQDTSKGGTDGQKSKPSPALNNNNQMKTETL
XP_021405980.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVGPENEAYKMPPEEEYQDYEPEA
XP_021385112.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
XP_021394530.1
TMESGAENQQSGDAAGTEAETQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETAAPAIATVTVPTPIYQTSSGQYIAITQGGAIQLSNNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
XP_021394327.1
RVNESELNSSVLPRDPPAEGAPRRQPWVTSTLAAILIFTIAVDLLGNLLVILSVYRNKKLRNAGNVFVVSLAVADLIVAIYPYPLVLTSVFHNGWKLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLKYDKLYSDRNSLCYIVLIWLLTFVAIVPNLFVGSLQYDPRIYSCTFAQSVSSAYTIAVVFFHFLLPIAVVTFCYLRIWILVIQVRRRVKPDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGIAVAVNPKTVIPRIPEWLFVSSYYMAYFNSCLNAIVYGLLNQNFRREYKRIIVNFCTAKVFFQDSSNDAGDRMRSKPSPLITNNNQVKVDSV
XP_021400667.1
LENGSLRNCCDPGGRGRLGLAEREAAAAGAPRPAWVVPVLSSVLIFTTVVDILGNLLVILSVFKNRKLRNSGNAFVVSLALADLVVALYPYPLVLLAIFHNGWTLGETHCKASGFVMGLSVIGSIFNITAIAINRYCYICHSFAYDKVYSCWNTMLYVSLVWILTVIATVPNFFVGSLKYDPRIYSCTFVQTASSYYTIAVVVIHFIVPITIVSFCYLRIWVLVLQVRRRVKSETKPRLKPSDFRNFLTMFVVFVIFAFCWAPLNFIGLAVAIDPTEMAPKVPEWLFIISYLMAYFNSCLNAIIYGLLNQNFRNEYKRILMSLWMPRLFFQDTSKGGTDGQKSKPSPALNNNNQMKTETI
所以输出看起来像
NP_001041718.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVDPENEAYEMPPEEEYQDYEPEA
XP_021405980.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVGPENEAYKMPPEEEYQDYEPEA
NP_001041719.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
XP_021385112.1
....
我知道如何在保持来自 ref awk 'FNR==NR {a[]=[=13=]; next}; in a {getline} {print a[]}' target ref
的顺序的同时找到目标中的匹配项,但我不知道如何打印后面的行。我知道如何打印 grep -A 1 -f ref target
之后的行,但它会重新排序目标文件
您可以颠倒传递给 awk 的文件的顺序,首先传递 ref 文件,然后创建一个数字递增的数组以保持键和值的顺序。
您可以不使用 getline,而是将最后一行保存在变量中,检查当前行是否存在于以第一个文件的值作为键存储的数组中。
如果是,则将最后一行加上当前行存储在一个新数组中 final
,并在 END 块中循环该数组。
awk '{
if (FNR==NR) {
a[]=i++; next
}
if (last in a) {
final[a[last]] = last RS
}
last =
}
END { for (i in final) print final[i] }
' ref target
输出
NP_001041718.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVDPENEAYEMPPEEEYQDYEPEA
XP_021405980.1
DVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSRTKEGVVHGVTTVAEKTKEQVSNVGGAVVTGVTAVAQKTVEGAGNIAAATGLVKKDQLAKQNEEGFLQEGMVNNTGVAVGPENEAYKMPPEEEYQDYEPEA
NP_001041719.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
XP_021385112.1
GKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGDASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKSEMLEIVQAIYKMVSSVMKMPEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSIVRLLQCDPSSAGQF
NP_001041721.1
TMESGAENQQSGDAAGTEAETQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETAAPAIATVTVPTPIYQTSSGQYIAITQGGAIQLSNNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
XP_021394530.1
TMESGAENQQSGDAAGTEAETQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETAAPAIATVTVPTPIYQTSSGQYIAITQGGAIQLSNNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
NP_001041722.1
RVNESELNSSVLPRDPPAEGAPRRQPWVTSTLAAILIFTIAVDLLGNLLVILSVYRNKKLRNAGNVFVVSLAVADLIVAIYPYPLVLTSVFHNGWKLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLKYDKLYSDRNSLCYIVLIWLLTFVAIVPNLFVGSLQYDPRIYSCTFAQSVSSAYTIAVVFFHFLLPIAVVTFCYLRIWILVIQVRRRVKPDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGIAVAVNPKTVIPRIPEWLFVSSYYMAYFNSCLNAIVYGLLNQNFRREYKRIIVNFCTAKVFFQDSSNDAGDRMRSKPSPLITNNNQVKVDSV
XP_021394327.1
RVNESELNSSVLPRDPPAEGAPRRQPWVTSTLAAILIFTIAVDLLGNLLVILSVYRNKKLRNAGNVFVVSLAVADLIVAIYPYPLVLTSVFHNGWKLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLKYDKLYSDRNSLCYIVLIWLLTFVAIVPNLFVGSLQYDPRIYSCTFAQSVSSAYTIAVVFFHFLLPIAVVTFCYLRIWILVIQVRRRVKPDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGIAVAVNPKTVIPRIPEWLFVSSYYMAYFNSCLNAIVYGLLNQNFRREYKRIIVNFCTAKVFFQDSSNDAGDRMRSKPSPLITNNNQVKVDSV
NP_001041723.1
LENGSLRNCCDPGGRGRLGLAEREAAAAGAPRPAWVVPVLSSVLIFTTVVDILGNLLVILSVFKNRKLRNSGNAFVVSLALADLVVALYPYPLVLLAIFHNGWTLGETHCKASGFVMGLSVIGSIFNITAIAINRYCYICHSFAYDKVYSCWNTMLYVSLVWILTVIATVPNFFVGSLKYDPRIYSCTFVQTASSYYTIAVVVIHFIVPITIVSFCYLRIWVLVLQVRRRVKSETKPRLKPSDFRNFLTMFVVFVIFAFCWAPLNFIGLAVAIDPTEMAPKVPEWLFIISYLMAYFNSCLNAIIYGLLNQNFRNEYKRISMSLWMPRLFFQDTSKGGTDGQKSKPSPALNNNNQMKTETL
XP_021400667.1
LENGSLRNCCDPGGRGRLGLAEREAAAAGAPRPAWVVPVLSSVLIFTTVVDILGNLLVILSVFKNRKLRNSGNAFVVSLALADLVVALYPYPLVLLAIFHNGWTLGETHCKASGFVMGLSVIGSIFNITAIAINRYCYICHSFAYDKVYSCWNTMLYVSLVWILTVIATVPNFFVGSLKYDPRIYSCTFVQTASSYYTIAVVVIHFIVPITIVSFCYLRIWVLVLQVRRRVKSETKPRLKPSDFRNFLTMFVVFVIFAFCWAPLNFIGLAVAIDPTEMAPKVPEWLFIISYLMAYFNSCLNAIIYGLLNQNFRNEYKRILMSLWMPRLFFQDTSKGGTDGQKSKPSPALNNNNQMKTETI
使用 getline into a variable 的变体:
awk '{
if (FNR==NR) {
i++;a[i]=;b[]=i;next
}
if ( in b && (getline tmp) > 0) {
final[b[]] = a[b[]] RS tmp
}
}
END { for (i in final) print final[i] }
' ref target
请您尝试以下操作:
awk 'FNR==NR { # process "target" file
if (FNR%2) a[key=]=[=10=] # store odd lines in array a
else b[key]=[=10=] # store even lines in array b using the same key as the previous line
next
}
in a {print a[]; print b[]} # if the key matches, print the odd line and the even line
' target ref
使用您显示的示例,请尝试以下 awk
代码。在 GNU awk
.
awk '
FNR==NR{
if(FNR%2==0){
arr[prev]=[=10=]
}
else{
prev=[=10=]
}
next
}
([=10=] in arr){
print [=10=] ORS arr[[=10=]]
}
' target ref
解释:为以上添加详细解释。
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when target file is being read.
if(FNR%2==0){ ##Checking condition if current line is getting divided completely by 0 then do following.
arr[prev]=[=11=] ##Creating arr with index of prev and value is current line.
}
else{ ##Else do following.
prev=[=11=] ##Setting prev to current line.
}
next ##next will skip all further statements from here.
}
([=11=] in arr){ ##If current line is present in arr then do following.
print [=11=] ORS arr[[=11=]] ##Printing current line ORS and arr with index of [=11=].
}
' target ref ##Mentioning Input_file names here.