将多个输入文件的 python 脚本修改为 运行
Modify python script to run for multiple input files
我是 python 的新手,我有一个 python 脚本到 运行 用于特定文件 (input1.txt) 并生成输出 (output1.fasta), 但我想 运行 这个脚本用于多个文件, 例如: input2.txt, input3.txt... 并生成相应的输出: output2.fasta, output3.fasta
from Bio import SeqIO
fasta_file = "sequences.txt"
wanted_file = "input1.txt"
result_file = "output1.fasta"
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
我尝试添加了glob功能,但我不知道如何处理输出文件名。
from Bio import SeqIO
import glob
fasta_file = "sequences.txt"
for filename in glob.glob('*.txt'):
wanted = set()
with open(filename) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
错误信息是:NameError: name 'result_file' is not defined
您的 glob
当前正在提取您的 "sequences" 文件以及输入,因为 *.txt
包含 sequences.txt
文件。如果 "fasta" 文件总是相同的,而你只想迭代输入文件,那么你需要
for filename in glob.glob('input*.txt'):
此外,要遍历整个过程,也许您想将它放在一个方法中。如果始终创建输出文件名以对应于输入,那么您可以动态创建它。
from Bio import SeqIO
def create_fasta_outputs(fasta_file, wanted_file):
result_file = wanted_file.replace("input","output").replace(".txt",".fasta")
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
fasta_file = "sequences.txt"
for wanted_file in glob.glob('input*.txt'):
create_fasta_outputs(fasta_file, wanted_file)
我是 python 的新手,我有一个 python 脚本到 运行 用于特定文件 (input1.txt) 并生成输出 (output1.fasta), 但我想 运行 这个脚本用于多个文件, 例如: input2.txt, input3.txt... 并生成相应的输出: output2.fasta, output3.fasta
from Bio import SeqIO
fasta_file = "sequences.txt"
wanted_file = "input1.txt"
result_file = "output1.fasta"
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
我尝试添加了glob功能,但我不知道如何处理输出文件名。
from Bio import SeqIO
import glob
fasta_file = "sequences.txt"
for filename in glob.glob('*.txt'):
wanted = set()
with open(filename) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
错误信息是:NameError: name 'result_file' is not defined
您的 glob
当前正在提取您的 "sequences" 文件以及输入,因为 *.txt
包含 sequences.txt
文件。如果 "fasta" 文件总是相同的,而你只想迭代输入文件,那么你需要
for filename in glob.glob('input*.txt'):
此外,要遍历整个过程,也许您想将它放在一个方法中。如果始终创建输出文件名以对应于输入,那么您可以动态创建它。
from Bio import SeqIO
def create_fasta_outputs(fasta_file, wanted_file):
result_file = wanted_file.replace("input","output").replace(".txt",".fasta")
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
fasta_file = "sequences.txt"
for wanted_file in glob.glob('input*.txt'):
create_fasta_outputs(fasta_file, wanted_file)