根据 for 循环生成的列表的字符串条件求和值

Question

我的代码搜索特定文件并调用单独的 .py 文件来输出一些数据。我手动为每个文件的文件大小附加了一行。我只是想将找到的文件的所有文件大小的总和附加到迭代的末尾。我想这将涉及使用布尔索引，但是我找不到任何好的参考。我想找到所有标记为 'file sizes' 的列，然后对它们的所有值求和。

一个样本迭代（我随机将许多'file sizes'彼此靠近，但在真实数据中，它们将被大约15行分开）

xd = """Version 3.1.5.0
GetFileName C:\users\trinh\downloads\higgi022_20150612_007_bsadig_100fm_aft_newIonTrap3.raw
GetCreatorID    thermo
GetVersionNumber    64
file size   1010058
file size   200038
file size   48576986
file size   387905
misc    tester
more    python"""

在 for 循环的末尾，我想对所有文件大小求和（这是非常错误的，但这是我最好的尝试）：

zd = xd.split()
for aline in zd:
    if 'file size' in aline:
        sum = 0
        for eachitem in aline[1:]:
            sum += eaechitem
            print(sum)

Answer 1

对于您提供的示例数据，要获取以 file size 开头的所有行的总数，您可以执行以下操作：

xd = """Version 3.1.5.0
GetFileName C:\users\trinh\downloads\higgi022_20150612_007_bsadig_100fm_aft_newIonTrap3.raw
GetCreatorID    thermo
GetVersionNumber    64
file size   1010058
file size   200038
file size   48576986
file size   387905
misc    tester
more    python"""

total = 0

for line in xd.splitlines():
    if line.startswith('file size'):
        total += int(line.split()[2])

print(total)

这将显示：

50174987

这首先将 xd 分成几行，并为每一行确定它是否以单词 file size 开头。如果是，则使用 split() 将该行分成 3 部分。第三部分包含大小作为字符串，因此需要使用 int().

将其转换为整数

要将其扩展到文件上，您首先需要读取文件并计算所需行的总和，然后以追加模式打开它以写入总计：

with open('data.txt') as f_input:
    total = 0

    for line in f_input:
        if line.startswith('file size'):
            total += int(line.split()[2])

with open('data.txt', 'a') as f_output:
    f_output.write("\nTotal file size: {}\n".format(total))

根据您当前的脚本，您可以按如下方式合并它：

import os
import csv
from subprocess import run, PIPE

pathfile = 'C:\users\trinh\downloads'
msfilepath = 'C:\users\trinh\downloads\msfilereader.py'

file_size_total = 0

with open("output.csv", "w", newline='') as csvout:
    writer = csv.writer(csvout, delimiter=',')

    for root, dirs, files in os.walk(pathfile):
        for f in files:
            if f.endswith(".raw"):
                fp = os.path.join(root, f) #join the directory root and the file name
                p = run(['python', msfilepath, fp], stdout=PIPE) #run the MSfilereader.py path and each iterated raw file found
                p = p.stdout.decode('utf-8')

                for aline in p.split('\r\n'):
                   header = aline.split(' ', 1)
                   writer.writerows([header])

                   if 'END SECTION' in aline and aline.endswith('###'):
                        file_size = os.stat(fp).st_size
                        file_size_total += file_size
                        lst_filsz = ['file size', str(file_size)]
                        writer.writerow(lst_filsz)

    writer.writerow(["Total file size:", file_size_total])

这将为您提供总共 file size 个条目。如果需要，还可以为每个部分添加小计。

注意，当使用 with open(.... 时，没有必要为文件也添加一个 close()，一旦离开 with 语句的范围，文件自动关闭。

根据 for 循环生成的列表的字符串条件求和值

sum up values according to string condition of a list generated by a for loop

python

csv

sum

python-3.x

export-to-csv