脚本拆分后如何在块文件上保留 CSV headers?

How to keep CSV headers on the chunk files after script split?

我需要帮助修改此脚本以在输出文件块中包含 headers。该脚本使用一些输入来确定进程将文件拆分为每个文件的行数。输出文件不包含原始文件中的 headers。我正在寻求有关如何实施的建议。

import csv
import os
import sys


os_path = os.path
csv_writer = csv.writer
sys_exit = sys.exit


if __name__ == '__main__':

    try:
        chunk_size = int(input('Input number of rows of one chunk file: '))
    except ValueError:
        print('Number of rows must be integer. Close.')
        sys_exit()

    file_path = input('Input path to .tsv file for splitting on chunks: ')

    if (
        not os_path.isfile(file_path) or
        not file_path.endswith('.tsv')
    ):
        print('You must input path to .tsv file for splitting.')
        sys_exit()

    file_name = os_path.splitext(file_path)[0]

    with open(file_path, 'r', newline='', encoding='utf-8') as tsv_file:

        chunk_file = None
        writer = None
        counter = 1
        reader = csv.reader(tsv_file, delimiter='\t', quotechar='\'')

        for index, chunk in enumerate(reader):

            if index % chunk_size == 0:

                if chunk_file is not None:
                    chunk_file.close()

                chunk_name = '{0}_{1}.tsv'.format(file_name, counter)
                chunk_file = open(chunk_name, 'w', newline='', encoding='utf-8')
                counter += 1
                writer = csv_writer(chunk_file, delimiter='\t', quotechar='\'')

                print('File "{}" complete.'.format(chunk_name))

            writer.writerow(chunk)

您可以通过在打开输入文件时手动读取 header 行,然后将其写入每个输出文件的开头来实现 — 请参见下面代码中的 ADDED 注释:

...
with open(file_path, 'r', newline='', encoding='utf-8') as tsv_file:

    chunk_file = None
    writer = None
    counter = 1
    reader = csv.reader(tsv_file, delimiter='\t', quotechar="'")
    header = next(reader)  # Read and save header row.  (ADDED)

    for index, chunk in enumerate(reader):
        if index % chunk_size == 0:
            if chunk_file is not None:
                chunk_file.close()

            chunk_name = '{0}_{1}.tsv'.format(file_name, counter)
            chunk_file = open(chunk_name, 'w', newline='', encoding='utf-8')
            writer = csv_writer(chunk_file, delimiter='\t', quotechar="'")
            writer.writerow(header)  # ADDED.
            print('File "{}" complete.'.format(chunk_name))
            counter += 1

        writer.writerow(chunk)

注意 使用 single-quote 个字符进行引用意味着输出文件不符合 CSV 标准:RFC 4180