将 Ns 添加到可变长度序列以使它们都等于 150bp 的最快方法
Fastest way to add Ns to variable length sequences such that they all equal 150bp
假设我有一个包含 3 个序列的 fasta...
ATTTTTGGA
AT
A
我希望我的序列数据如下所示:
ATTTTTGGA
ATTNNNNNN
ANNNNNNNN
是否有任何程序或脚本可以在合理的时间范围内完成此任务。我有数千个序列。谢谢!
我正在四处乱逛并尝试了这个,文件最终是空白的,但这是我所得到的。
import sys
from Bio import SeqIO
from Bio.Seq import Seq
in_file = open(sys.argv[1],'r')
sequences = SeqIO.parse(in_file, "fasta")
output_in_file = open("test.fasta", "w")
for record in sequences:
n = 150
record.seq = record.seq + ("N" * n)
seq = seq[:n]
output_in_file.close()
in_file.close()
改进您的代码,
import sys
from Bio import SeqIO
from Bio.Seq import Seq
with open(sys.argv[1], "r") as in_file:
sequences = list(SeqIO.parse(in_file, "fasta"))
n = max(map(len, sequences)) #find max len in sequences
for record in sequences:
record.seq = record.seq + ("N" * (n-len(record)))
SeqIO.write(sequences, "test.fasta", "fasta")
你得到,在test.fasta
>id_1
ATTTTTGGA
>id_2
ATNNNNNNN
>id_3
ANNNNNNNN
对于"all equal 150bp"
import sys
from Bio import SeqIO
from Bio.Seq import Seq
with open(sys.argv[1], "r") as in_file:
sequences = list(SeqIO.parse(in_file, "fasta"))
n = 150
for record in sequences:
record.seq = record.seq + ("N" * (n-len(record)))
SeqIO.write(sequences, "test.fasta", "fasta")
你明白了,
>id_1
ATTTTTGGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>id_2
ATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>id_3
ANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
假设我有一个包含 3 个序列的 fasta...
ATTTTTGGA
AT
A
我希望我的序列数据如下所示:
ATTTTTGGA
ATTNNNNNN
ANNNNNNNN
是否有任何程序或脚本可以在合理的时间范围内完成此任务。我有数千个序列。谢谢!
我正在四处乱逛并尝试了这个,文件最终是空白的,但这是我所得到的。
import sys
from Bio import SeqIO
from Bio.Seq import Seq
in_file = open(sys.argv[1],'r')
sequences = SeqIO.parse(in_file, "fasta")
output_in_file = open("test.fasta", "w")
for record in sequences:
n = 150
record.seq = record.seq + ("N" * n)
seq = seq[:n]
output_in_file.close()
in_file.close()
改进您的代码,
import sys
from Bio import SeqIO
from Bio.Seq import Seq
with open(sys.argv[1], "r") as in_file:
sequences = list(SeqIO.parse(in_file, "fasta"))
n = max(map(len, sequences)) #find max len in sequences
for record in sequences:
record.seq = record.seq + ("N" * (n-len(record)))
SeqIO.write(sequences, "test.fasta", "fasta")
你得到,在test.fasta
>id_1 ATTTTTGGA >id_2 ATNNNNNNN >id_3 ANNNNNNNN
对于"all equal 150bp"
import sys
from Bio import SeqIO
from Bio.Seq import Seq
with open(sys.argv[1], "r") as in_file:
sequences = list(SeqIO.parse(in_file, "fasta"))
n = 150
for record in sequences:
record.seq = record.seq + ("N" * (n-len(record)))
SeqIO.write(sequences, "test.fasta", "fasta")
你明白了,
>id_1 ATTTTTGGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN >id_2 ATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN >id_3 ANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN