删除跨列 awk 的多个模式条件的记录
Delete records for which multiple pattern conditional s across columns awk
我有一个文件看起来像
NC_042565.1 RefSeq region 1 114882317 . + . ID=NC_042565.1:1..114882317;Dbxref=taxon:299123;Name=1;chromosome=1;dev-stage=adult;gbkey=Src;genome=chromosome;isolate=Mets1;mol_type=genomic DNA;sex=male;sub-species=domestica;tissue-type=blood
NC_042565.1 Gnomon gene 21625 41521 . - . ID=gene-LCMT2;Dbxref=GeneID:110474964;Name=LCMT2;gbkey=Gene;gene=LCMT2;gene_biotype=protein_coding
NC_042565.1 Gnomon mRNA 21625 41521 . - . ID=rna-XM_021538777.2;Parent=gene-LCMT2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;Name=XM_021538777.2;gbkey=mRNA;gene=LCMT2;model_evidence=Supporting evidence includes similarity to: 2 ESTs%2C 9 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 41062 41521 . - . ID=exon-XM_021538777.2-1;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 39337 39418 . - . ID=exon-XM_021538777.2-2;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 38834 39014 . - . ID=exon-XM_021538777.2-3;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 36546 36702 . - . ID=exon-XM_021538777.2-4;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 35950 36139 . - . ID=exon-XM_021538777.2-5;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 35437 35544 . - . ID=exon-XM_021538777.2-6;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 33345 33435 . - . ID=exon-XM_021538777.2-7;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 30949 31197 . - . ID=exon-XM_021538777.2-8;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 28678 28908 . - . ID=exon-XM_021538777.2-9;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 27570 27667 . - . ID=exon-XM_021538777.2-10;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 25692 25879 . - . ID=exon-XM_021538777.2-11;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 25355 25490 . - . ID=exon-XM_021538777.2-12;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 21625 23392 . - . ID=exon-XM_021538777.2-13;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 11328398 11328458 . + . ID=id-LOC110483275;Parent=gene-LOC110483275;Dbxref=GeneID:110483275;gbkey=exon;gene=LOC110483275;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 11%25 coverage of the annotated genomic feature by RNAseq alignments
NC_042565.1 Gnomon exon 11331449 11332392 . + . ID=id-LOC110483275-2;Parent=gene-LOC110483275;Dbxref=GeneID:110483275;gbkey=exon;gene=LOC110483275;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 11%25 coverage of the annotated genomic feature by RNAseq alignments
NC_042565.1 tRNAscan-SE exon 16005736 16005808 . + . ID=exon-TRNAV-UAC-1;Parent=rna-TRNAV-UAC;Dbxref=GeneID:110483291;Note=transfer RNA valine (anticodon UAC);anticodon=(pos:16005769..16005771);gbkey=tRNA;gene=TRNAV-UAC;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Val
NC_042565.1 Gnomon exon 40513973 40514551 . + . ID=id-LOC110470572;Parent=gene-LOC110470572;Dbxref=GeneID:110470572;gbkey=exon;gene=LOC110470572;model_evidence=Supporting evidence includes similarity to: 1 Protein
NC_042565.1 Gnomon exon 40514711 40514960 . + . ID=id-LOC110470572-2;Parent=gene-LOC110470572;Dbxref=GeneID:110470572;gbkey=exon;gene=LOC110470572;model_evidence=Supporting evidence includes similarity to: 1 Protein
NC_042565.1 tRNAscan-SE exon 41451994 41452066 . + . ID=exon-TRNAF-GAA-1;Parent=rna-TRNAF-GAA;Dbxref=GeneID:110470583;Note=transfer RNA phenylalanine (anticodon GAA);anticodon=(pos:41452027..41452029);gbkey=tRNA;gene=TRNAF-GAA;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Phe
NC_042565.1 tRNAscan-SE exon 45245322 45245390 . + . ID=exon-TRNAK-CUU-1;Parent=rna-TRNAK-CUU;Dbxref=GeneID:110468118;Note=transfer RNA lysine (anticodon CUU);anticodon=(pos:45245351..45245353);gbkey=tRNA;gene=TRNAK-CUU;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Lys
NC_042565.1 tRNAscan-SE exon 49805074 49805146 . - . ID=exon-TRNAV-AAC-1;Parent=rna-TRNAV-AAC;Dbxref=GeneID:110476772;Note=transfer RNA valine (anticodon AAC);anticodon=(pos:complement(49805111..49805113));gbkey=tRNA;gene=TRNAV-AAC;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Val
NC_042565.1 tRNAscan-SE exon 49805393 49805466 . - . ID=exon-TRNAN-GUU-1;Parent=rna-TRNAN-GUU;Dbxref=GeneID:110476771;Note=transfer RNA asparagine (anticodon GUU);anticodon=(pos:complement(49805430..49805432));gbkey=tRNA;gene=TRNAN-GUU;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Asn
NC_042565.1 Gnomon exon 87281852 87281945 . + . ID=exon-id-LOC110480752-1;Parent=id-LOC110480752;Dbxref=GeneID:110480752;gbkey=V_segment;gene=LOC110480752;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;standard_name=T cell receptor beta variable 14-like
我需要删除
的行
~ /exon|guide_RNA|lnc_RNA\t|mRNA|snoRNA|snRNA\t|transcript/
和
您在最后一列中找到了字符串 /gene=/
,但没有找到 /transcript_id=/
我尝试按 ;
拆分列,只是为了看看我是否至少可以捕获正确的行,然后弄清楚如何删除它们,但我一直得到与输出相同的整个文件
awk 'BEGIN { FS = ";" } NR==1 {for(i=1;i<=NF;i++) if ( ~ /exon\t|guide_RNA\t|lnc_RNA\t|mRNA\t|snoRNA\t|snRNA\t|transcript\t/ && $i ~ /gene=/ && $i !~ /transcript_id=/) f=i;next} {print $f}' BFgenomic.gff
我想删除的行:
awk ' ~ /exon|guide_RNA|lnc_RNA|mRNA|snoRNA|snRNA|transcript/' BFgenomic.gff | grep -v transcript_id= | grep gene=
NC_042565.1 Gnomon exon 11328398 11328458 . + . ID=id-LOC110483275;Parent=gene-LOC110483275;Dbxref=GeneID:110483275;gbkey=exon;gene=LOC110483275;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 11%25 coverage of the annotated genomic feature by RNAseq alignments
NC_042565.1 Gnomon exon 11331449 11332392 . + . ID=id-LOC110483275-2;Parent=gene-LOC110483275;Dbxref=GeneID:110483275;gbkey=exon;gene=LOC110483275;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 11%25 coverage of the annotated genomic feature by RNAseq alignments
NC_042565.1 tRNAscan-SE exon 16005736 16005808 . + . ID=exon-TRNAV-UAC-1;Parent=rna-TRNAV-UAC;Dbxref=GeneID:110483291;Note=transfer RNA valine (anticodon UAC);anticodon=(pos:16005769..16005771);gbkey=tRNA;gene=TRNAV-UAC;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Val
NC_042565.1 Gnomon exon 40513973 40514551 . + . ID=id-LOC110470572;Parent=gene-LOC110470572;Dbxref=GeneID:110470572;gbkey=exon;gene=LOC110470572;model_evidence=Supporting evidence includes similarity to: 1 Protein
NC_042565.1 Gnomon exon 40514711 40514960 . + . ID=id-LOC110470572-2;Parent=gene-LOC110470572;Dbxref=GeneID:110470572;gbkey=exon;gene=LOC110470572;model_evidence=Supporting evidence includes similarity to: 1 Protein
NC_042565.1 tRNAscan-SE exon 41451994 41452066 . + . ID=exon-TRNAF-GAA-1;Parent=rna-TRNAF-GAA;Dbxref=GeneID:110470583;Note=transfer RNA phenylalanine (anticodon GAA);anticodon=(pos:41452027..41452029);gbkey=tRNA;gene=TRNAF-GAA;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Phe
NC_042565.1 tRNAscan-SE exon 45245322 45245390 . + . ID=exon-TRNAK-CUU-1;Parent=rna-TRNAK-CUU;Dbxref=GeneID:110468118;Note=transfer RNA lysine (anticodon CUU);anticodon=(pos:45245351..45245353);gbkey=tRNA;gene=TRNAK-CUU;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Lys
NC_042565.1 tRNAscan-SE exon 49805074 49805146 . - . ID=exon-TRNAV-AAC-1;Parent=rna-TRNAV-AAC;Dbxref=GeneID:110476772;Note=transfer RNA valine (anticodon AAC);anticodon=(pos:complement(49805111..49805113));gbkey=tRNA;gene=TRNAV-AAC;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Val
NC_042565.1 tRNAscan-SE exon 49805393 49805466 . - . ID=exon-TRNAN-GUU-1;Parent=rna-TRNAN-GUU;Dbxref=GeneID:110476771;Note=transfer RNA asparagine (anticodon GUU);anticodon=(pos:complement(49805430..49805432));gbkey=tRNA;gene=TRNAN-GUU;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Asn
NC_042565.1 Gnomon exon 87281852 87281945 . + . ID=exon-id-LOC110480752-1;Parent=id-LOC110480752;Dbxref=GeneID:110480752;gbkey=V_segment;gene=LOC110480752;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;standard_name=T cell receptor beta variable 14-like
你想要:
awk -F'\t' '
~ /exon|guide_RNA|lnc_RNA|mRNA|snoRNA|snRNA|transcript/ && \
$NF ~/gene=/ && \
$NF !~ /transcript_id=/ {next}
{print}
' ~/tmp/file
你可以考虑:
awk -F '\t' '!(
~ /^(exon|(guide_|lnc_|m|sno?)RNA|transcript)$/ &&
$NF ~ /(^|;)gene=/ &&
$NF !~ /(^|;)transcript_id=/
)' file
- 由于您只比较
</code>,因此制表符分隔文件中不会存在 <code>\t
。最好使用锚点 ^
和 $
,如此处所示。
- 对于最后一个字段,使用
(^|;)
以确保该字段中没有部分匹配项
- 注意
</code></li> 交替的重构
<li>从头到尾记下否定块<code>!(...)
我有一个文件看起来像
NC_042565.1 RefSeq region 1 114882317 . + . ID=NC_042565.1:1..114882317;Dbxref=taxon:299123;Name=1;chromosome=1;dev-stage=adult;gbkey=Src;genome=chromosome;isolate=Mets1;mol_type=genomic DNA;sex=male;sub-species=domestica;tissue-type=blood
NC_042565.1 Gnomon gene 21625 41521 . - . ID=gene-LCMT2;Dbxref=GeneID:110474964;Name=LCMT2;gbkey=Gene;gene=LCMT2;gene_biotype=protein_coding
NC_042565.1 Gnomon mRNA 21625 41521 . - . ID=rna-XM_021538777.2;Parent=gene-LCMT2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;Name=XM_021538777.2;gbkey=mRNA;gene=LCMT2;model_evidence=Supporting evidence includes similarity to: 2 ESTs%2C 9 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 41062 41521 . - . ID=exon-XM_021538777.2-1;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 39337 39418 . - . ID=exon-XM_021538777.2-2;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 38834 39014 . - . ID=exon-XM_021538777.2-3;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 36546 36702 . - . ID=exon-XM_021538777.2-4;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 35950 36139 . - . ID=exon-XM_021538777.2-5;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 35437 35544 . - . ID=exon-XM_021538777.2-6;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 33345 33435 . - . ID=exon-XM_021538777.2-7;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 30949 31197 . - . ID=exon-XM_021538777.2-8;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 28678 28908 . - . ID=exon-XM_021538777.2-9;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 27570 27667 . - . ID=exon-XM_021538777.2-10;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 25692 25879 . - . ID=exon-XM_021538777.2-11;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 25355 25490 . - . ID=exon-XM_021538777.2-12;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 21625 23392 . - . ID=exon-XM_021538777.2-13;Parent=rna-XM_021538777.2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;gbkey=mRNA;gene=LCMT2;product=leucine carboxyl methyltransferase 2;transcript_id=XM_021538777.2
NC_042565.1 Gnomon exon 11328398 11328458 . + . ID=id-LOC110483275;Parent=gene-LOC110483275;Dbxref=GeneID:110483275;gbkey=exon;gene=LOC110483275;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 11%25 coverage of the annotated genomic feature by RNAseq alignments
NC_042565.1 Gnomon exon 11331449 11332392 . + . ID=id-LOC110483275-2;Parent=gene-LOC110483275;Dbxref=GeneID:110483275;gbkey=exon;gene=LOC110483275;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 11%25 coverage of the annotated genomic feature by RNAseq alignments
NC_042565.1 tRNAscan-SE exon 16005736 16005808 . + . ID=exon-TRNAV-UAC-1;Parent=rna-TRNAV-UAC;Dbxref=GeneID:110483291;Note=transfer RNA valine (anticodon UAC);anticodon=(pos:16005769..16005771);gbkey=tRNA;gene=TRNAV-UAC;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Val
NC_042565.1 Gnomon exon 40513973 40514551 . + . ID=id-LOC110470572;Parent=gene-LOC110470572;Dbxref=GeneID:110470572;gbkey=exon;gene=LOC110470572;model_evidence=Supporting evidence includes similarity to: 1 Protein
NC_042565.1 Gnomon exon 40514711 40514960 . + . ID=id-LOC110470572-2;Parent=gene-LOC110470572;Dbxref=GeneID:110470572;gbkey=exon;gene=LOC110470572;model_evidence=Supporting evidence includes similarity to: 1 Protein
NC_042565.1 tRNAscan-SE exon 41451994 41452066 . + . ID=exon-TRNAF-GAA-1;Parent=rna-TRNAF-GAA;Dbxref=GeneID:110470583;Note=transfer RNA phenylalanine (anticodon GAA);anticodon=(pos:41452027..41452029);gbkey=tRNA;gene=TRNAF-GAA;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Phe
NC_042565.1 tRNAscan-SE exon 45245322 45245390 . + . ID=exon-TRNAK-CUU-1;Parent=rna-TRNAK-CUU;Dbxref=GeneID:110468118;Note=transfer RNA lysine (anticodon CUU);anticodon=(pos:45245351..45245353);gbkey=tRNA;gene=TRNAK-CUU;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Lys
NC_042565.1 tRNAscan-SE exon 49805074 49805146 . - . ID=exon-TRNAV-AAC-1;Parent=rna-TRNAV-AAC;Dbxref=GeneID:110476772;Note=transfer RNA valine (anticodon AAC);anticodon=(pos:complement(49805111..49805113));gbkey=tRNA;gene=TRNAV-AAC;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Val
NC_042565.1 tRNAscan-SE exon 49805393 49805466 . - . ID=exon-TRNAN-GUU-1;Parent=rna-TRNAN-GUU;Dbxref=GeneID:110476771;Note=transfer RNA asparagine (anticodon GUU);anticodon=(pos:complement(49805430..49805432));gbkey=tRNA;gene=TRNAN-GUU;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Asn
NC_042565.1 Gnomon exon 87281852 87281945 . + . ID=exon-id-LOC110480752-1;Parent=id-LOC110480752;Dbxref=GeneID:110480752;gbkey=V_segment;gene=LOC110480752;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;standard_name=T cell receptor beta variable 14-like
我需要删除
的行 ~ /exon|guide_RNA|lnc_RNA\t|mRNA|snoRNA|snRNA\t|transcript/
和
您在最后一列中找到了字符串 /gene=/
,但没有找到 /transcript_id=/
我尝试按 ;
拆分列,只是为了看看我是否至少可以捕获正确的行,然后弄清楚如何删除它们,但我一直得到与输出相同的整个文件
awk 'BEGIN { FS = ";" } NR==1 {for(i=1;i<=NF;i++) if ( ~ /exon\t|guide_RNA\t|lnc_RNA\t|mRNA\t|snoRNA\t|snRNA\t|transcript\t/ && $i ~ /gene=/ && $i !~ /transcript_id=/) f=i;next} {print $f}' BFgenomic.gff
我想删除的行:
awk ' ~ /exon|guide_RNA|lnc_RNA|mRNA|snoRNA|snRNA|transcript/' BFgenomic.gff | grep -v transcript_id= | grep gene=
NC_042565.1 Gnomon exon 11328398 11328458 . + . ID=id-LOC110483275;Parent=gene-LOC110483275;Dbxref=GeneID:110483275;gbkey=exon;gene=LOC110483275;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 11%25 coverage of the annotated genomic feature by RNAseq alignments
NC_042565.1 Gnomon exon 11331449 11332392 . + . ID=id-LOC110483275-2;Parent=gene-LOC110483275;Dbxref=GeneID:110483275;gbkey=exon;gene=LOC110483275;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 11%25 coverage of the annotated genomic feature by RNAseq alignments
NC_042565.1 tRNAscan-SE exon 16005736 16005808 . + . ID=exon-TRNAV-UAC-1;Parent=rna-TRNAV-UAC;Dbxref=GeneID:110483291;Note=transfer RNA valine (anticodon UAC);anticodon=(pos:16005769..16005771);gbkey=tRNA;gene=TRNAV-UAC;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Val
NC_042565.1 Gnomon exon 40513973 40514551 . + . ID=id-LOC110470572;Parent=gene-LOC110470572;Dbxref=GeneID:110470572;gbkey=exon;gene=LOC110470572;model_evidence=Supporting evidence includes similarity to: 1 Protein
NC_042565.1 Gnomon exon 40514711 40514960 . + . ID=id-LOC110470572-2;Parent=gene-LOC110470572;Dbxref=GeneID:110470572;gbkey=exon;gene=LOC110470572;model_evidence=Supporting evidence includes similarity to: 1 Protein
NC_042565.1 tRNAscan-SE exon 41451994 41452066 . + . ID=exon-TRNAF-GAA-1;Parent=rna-TRNAF-GAA;Dbxref=GeneID:110470583;Note=transfer RNA phenylalanine (anticodon GAA);anticodon=(pos:41452027..41452029);gbkey=tRNA;gene=TRNAF-GAA;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Phe
NC_042565.1 tRNAscan-SE exon 45245322 45245390 . + . ID=exon-TRNAK-CUU-1;Parent=rna-TRNAK-CUU;Dbxref=GeneID:110468118;Note=transfer RNA lysine (anticodon CUU);anticodon=(pos:45245351..45245353);gbkey=tRNA;gene=TRNAK-CUU;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Lys
NC_042565.1 tRNAscan-SE exon 49805074 49805146 . - . ID=exon-TRNAV-AAC-1;Parent=rna-TRNAV-AAC;Dbxref=GeneID:110476772;Note=transfer RNA valine (anticodon AAC);anticodon=(pos:complement(49805111..49805113));gbkey=tRNA;gene=TRNAV-AAC;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Val
NC_042565.1 tRNAscan-SE exon 49805393 49805466 . - . ID=exon-TRNAN-GUU-1;Parent=rna-TRNAN-GUU;Dbxref=GeneID:110476771;Note=transfer RNA asparagine (anticodon GUU);anticodon=(pos:complement(49805430..49805432));gbkey=tRNA;gene=TRNAN-GUU;inference=COORDINATES: profile:tRNAscan-SE:1.23;product=tRNA-Asn
NC_042565.1 Gnomon exon 87281852 87281945 . + . ID=exon-id-LOC110480752-1;Parent=id-LOC110480752;Dbxref=GeneID:110480752;gbkey=V_segment;gene=LOC110480752;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns;standard_name=T cell receptor beta variable 14-like
你想要:
awk -F'\t' '
~ /exon|guide_RNA|lnc_RNA|mRNA|snoRNA|snRNA|transcript/ && \
$NF ~/gene=/ && \
$NF !~ /transcript_id=/ {next}
{print}
' ~/tmp/file
你可以考虑:
awk -F '\t' '!(
~ /^(exon|(guide_|lnc_|m|sno?)RNA|transcript)$/ &&
$NF ~ /(^|;)gene=/ &&
$NF !~ /(^|;)transcript_id=/
)' file
- 由于您只比较
</code>,因此制表符分隔文件中不会存在 <code>\t
。最好使用锚点^
和$
,如此处所示。 - 对于最后一个字段,使用
(^|;)
以确保该字段中没有部分匹配项 - 注意
</code></li> 交替的重构 <li>从头到尾记下否定块<code>!(...)