在处理大文件 (20GB+) 时，如何在 python 中使文件解析和 I/O 更快

Question

这是我下面的基本示例代码：

def process(line):
    data = line.split("-|-")
    print(userpass)
    try:
        data1, data2 = data[2], data[3]
        finalline = f"{data1} some text here {data2}\n"
        with open("parsed.txt", 'a', encoding="utf-8") as wf:
            wf.write(finalline)
    except:
        pass

with open("file.txt", "r", encoding="utf-8") as f:
    for line in f:
        process(line)

这工作得很好。但是有什么方法可以使它运行使用多线程或内核更快？？

或者在操作的同时能够达到我SSD的读写速度？如有任何帮助，我们将不胜感激！

Answer 1

函数调用在 Python 中会产生大量开销。不要在文件的每一行都调用函数，而是内联定义。另外，不要重复打开同一个输出文件；打开一次并保持打开状态。

with open("file.txt", "r", encoding="utf-8") as f, \
     open("parsed.txt", "a", encoding="utf-8") as outh:
    for line in f:
        data = line.split("-|-")
        try:
            print(f"{data[2]} some text here {data[3]}", file=outh)
        except Exception:
            pass

在处理大文件 (20GB+) 时，如何在 python 中使文件解析和 I/O 更快

How can I make file parsing and I/O faster in python when working with huge files (20GB+)

python

file-io

parsing

bigdata

python-multiprocessing