使用 tsv 文件中的列 - python 3

Question

我有一个 tsv 文件拆分成列，我需要从中 select 特定列并将它们写入新文件（基本上过滤原始文件）。这些列是根据单独列表中包含的标题 select 编辑的。我设法找到了相关列的索引，但出于某种原因，我无法让它们正确写入新文件。

with open ("some_file.txt", "w") as out_file, open("another_file.txt", "r") as in_file:
first_line = True
for line in in_file: 
    line = line.rstrip("\n")
    line = line.split("\t")         
    if first_line:   
        column_indices = [x for x in range(len(line)) if line[x] in [some_list]
        first_line = False

如果我手动插入索引 (out_file.write(line[7] + "\n")，则会打印正确的列，但我尝试过的 loop/list comp 类型没有适用于所有索引。我设法编写所有相关内容的唯一方法是在 headers 之后的行中，而不是每个标题下的列。

我是 python 的新手，非常感谢任何帮助/见解！

Answer 1

Python 与专为您的用例设计的 csv module, which contains DictReader and DictWriter 类打包在一起。无需重新发明轮子：

input.tsv:

col1    col2    col3    col4    col5
1   2   3   4   5
2   3   4   5   6
3   4   5   6   7
4   5   6   7   8

Python:

import csv

with open('input.tsv','r',newline='') as fin,open('output.tsv','w',newline='') as fout:
    reader = csv.DictReader(fin,delimiter='\t')
    writer = csv.DictWriter(fout,delimiter='\t',fieldnames=['col2','col3','col4'],extrasaction='ignore')
    writer.writeheader()
    for row in reader:
        writer.writerow(row)

output.tsv:

col2    col3    col4
2   3   4
3   4   5
4   5   6
5   6   7

使用 tsv 文件中的列 - python 3

working with columns in tsv files - python 3

python

csv

file

python-3.7