将多个列表写入多个输出文件

Writing multiple lists to multiple output files

我正在处理存储在大型文本文件中的数据集。对于我正在进行的分析,我打开文件,提取部分数据集并比较提取的 子集 。我的代码是这样工作的:

from math import ceil

with open("seqs.txt","rb") as f:
    f = f.readlines()

assert type(f) == list, "ERROR: file object not converted to list"

fives = int( ceil(0.05*len(f)) ) 
thirds = int( ceil(len(f)/3) )

## top/bottom 5% of dataset
low_5=f[0:fives]
top_5=f[-fives:]

## top/bottom 1/3 of dataset
low_33=f[0:thirds]
top_33=f[-thirds:]

## Write lists to file
# top-5
with open("high-5.out","w") as outfile1:
   for i in top_5:
       outfile1.write("%s" %i)
# low-5
with open("low-5.out","w") as outfile2:
    for i in low_5:
        outfile2.write("%s" %i)
# top-33
with open("high-33.out","w") as outfile3:
    for i in top_33:
        outfile3.write("%s" %i)
# low-33        
with open("low-33.out","w") as outfile4:
    for i in low_33:
        outfile4.write("%s" %i)

我正在尝试寻找一种更聪明的方法来自动执行将列表写入文件的过程。在这种情况下,只有四个,但在将来我可能会得到多达 15-25 个列表的情况下,我会使用一些函数来处理这个问题。我写了以下内容:

def write_to_file(*args):
    for i in args:
        with open(".out", "w") as outfile:
            outfile.write("%s" %i)

但是当我这样调用函数时,生成的文件只包含最终列表:

write_to_file(low_33,low_5,top_33,top_5)

我知道我必须为每个列表定义一个输出文件(我在上面的函数中没有这样做),我只是不确定如何实现它。有任何想法吗?

通过为每个参数递增一个计数器,您可以为每个参数生成一个输出文件。例如:

def write_to_file(*args):
    for index, i in enumerate(args):
        with open("{}.out".format(index+1), "w") as outfile:
           outfile.write("%s" %i)

以上示例将创建输出文件 "1.out""2.out""3.out""4.out".

或者,如果您有想要使用的特定名称(如在您的原始代码中),您可以执行如下操作:

def write_to_file(args):
    for name, data in args:
        with open("{}.out".format(name), "w") as outfile:
            outfile.write("%s" % data)

args = [('low-33', low_33), ('low-5', low_5), ('high-33', top_33), ('high-5', top_5)]
write_to_file(args)

这将创建输出文件 "low-33.out""low-5.out""high-33.out""high-5.out"

在这一行中,您每次都打开一个名为 .out 的文件并写入其中。

with open(".out", "w") as outfile:

您需要使 args 中每个 i".out" 唯一。您可以通过传入一个列表作为参数来实现这一点,该列表将包含文件名和数据。

def write_to_file(*args):
    for i in args:
        with open("%s.out" % i[0], "w") as outfile:
            outfile.write("%s" % i[1])

并像这样传递参数...

write_to_file(["low_33",low_33],["low_5",low_5],["top_33",top_33],["top_5",top_5])

让你的变量名与你的文件名相匹配,然后使用字典来保存它们,而不是将它们保存在全局命名空间中:

data = {'high_5': # data
       ,'low_5': # data
       ,'high_33': # data
       ,'low_33': # data}

for key in data:
    with open('{}.out'.format(key), 'w') as output:
        for i in data[key]:
            output.write(i)

将您的数据保存在一个易于使用的地方,假设您想对它们应用相同的操作,您可以继续使用相同的范例。

正如下面 PM2Ring 所提到的,建议使用下划线(就像您在变量名中所做的那样)而不是破折号(就像您在文件名中所做的那样),这样您就可以将字典键作为关键字传递写入函数的参数:

write_to_file(**data)

这相当于:

write_to_file(low_5=f[:fives], high_5=f[-fives:],...) # and the rest of the data

由此您可以使用其他答案定义的函数之一。

您正在创建一个名为“.out”的文件并每次都覆盖它。

def write_to_file(*args):
    for i in args:
        filename = i + ".out"
        contents = globals()[i]
        with open(".out", "w") as outfile:
            outfile.write("%s" %contents)


write_to_file("low_33", "low_5", "top_33", "top_5")

(字符串中的变量名)

这将创建 low_33.out、low_5.out、top_33.out、top_5.out,它们的内容将是存储在这些变量中的列表。

不要试图聪明。相反,旨在让您的代码可读,易于理解。您可以将重复的代码分组到一个函数中,例如:

from math import ceil

def save_to_file(data, filename):
    with open(filename, 'wb') as f:
        for item in data:
            f.write('{}'.format(item))

with open('data.txt') as f:
    numbers = list(f)

five_percent = int(len(numbers) * 0.05)
thirty_three_percent = int(ceil(len(numbers) / 3.0))
# Why not: thirty_three_percent = int(len(numbers) * 0.33)
save_to_file(numbers[:five_percent], 'low-5.out')
save_to_file(numbers[-five_percent:], 'high-5.out')
save_to_file(numbers[:thirty_three_percent], 'low-33.out')
save_to_file(numbers[-thirty_three_percent:], 'high-33.out')

更新

如果你有很多列表要写,那么使用循环是有意义的。我建议有两个功能:save_top_n_percentsave_low_n_percent 来帮助完成这项工作。它们包含一些重复的代码,但是通过将它们分成两个函数,它更清晰易懂。

def save_to_file(data, filename):
    with open(filename, 'wb') as f:
        for item in data:
            f.write(item)

def save_top_n_percent(n, data):
    n_percent = int(len(data) * n / 100.0)
    save_to_file(data[-n_percent:], 'top-{}.out'.format(n))

def save_low_n_percent(n, data):
    n_percent = int(len(data) * n / 100.0)
    save_to_file(data[:n_percent], 'low-{}.out'.format(n))

with open('data.txt') as f:
    numbers = list(f)

for n_percent in [5, 33]:
    save_top_n_percent(n_percent, numbers)
    save_low_n_percent(n_percent, numbers)