在 Python 中列出文件中的每一列

Question

我想为 txt 文件中的每一列创建一个列表。该文件如下所示：

NAME S1 S2 S3 S4 A 1 4 3 1 B 2 1 2 6 C 2 1 3 5

问题 1。如何动态生成适合列数的列表数，以便我可以填充它们？在某些文件中我将有 4 列，其他文件中我将有 6 或 8 列...

问题 2. 什么是 pythonic 方法来遍历每一列并像这样列出值：

list_s1 = [1,2,2]

list_s2 = [4,1,1]

等等

现在我已经阅读了 txt 文件并且每一行都有。作为输入，我给出了文件中 NAMES 的数量（此处 HOW_MANY_SAMPLES = 4）

def parse_textFile(file):

    list_names = []
    with open(file) as f:
        header = f.next()
        head_list = header.rstrip("\r\n").split("\t")
        for i in f:
            e = i.rstrip("\r\n").split("\t")
            list_names.append(e)

    for i in range(1, HOW_MANY_SAMPLES):    
        l+i = []
        l+i.append([a[i] for a in list_names])

我需要一种动态方式来创建和填充与 table 中的列数相对应的列表数。

Answer 1

问题 1：

您可以使用 len(head_list) 而不必指定 HOW_MANY_SAMPLES。

您也可以尝试使用 Python's CSV module 并将分隔符设置为 space 或制表符而不是逗号。

见this answer to a similar Whosebug question。

问题 2：

获得代表每一行的列表后，您可以使用 zip 创建代表每一列的列表：参见 this answer。

使用 CSV 模块，您可以 follow this suggestion，这是将数据从基于行的列表转换为基于列的列表的另一种方法。

样本：

import csv

# open the file in universal line ending mode 
with open('data.txt', 'rU') as infile:

    # register a dialect that skips extra whitespace
    csv.register_dialect('ignorespaces', delimiter=' ', skipinitialspace=True)

    # read the file as a dictionary for each row ({header : value})
    reader = csv.DictReader(infile, dialect='ignorespaces')
    data = {}
    for row in reader:
        for header, value in row.items():
            try:
                if (header):
                    data[header].append(value)
            except KeyError:
                data[header] = [value]

for column in data.keys():
    print (column + ": " + str(data[column]))

这产生：

S2: ['4', '1', '1']
S1: ['1', '2', '2']
S3: ['3', '2', '3']
S4: ['1', '6', '5']
NAME: ['A', 'B', 'C']

Answer 2

通过使用 pandas，您可以创建列表列表或 dic 来获取您要查找的内容。

从您的文件创建一个 dataframe，然后遍历每一列并将其添加到列表或 dic。

from StringIO import StringIO
import pandas as pd

TESTDATA = StringIO("""NAME   S1   S2   S3   S4
                        A   1    4   3   1 
                        B   2    1   2   6
                        C   2    1   3   5""")

columns = []
c_dic = {}
df = pd.read_csv(TESTDATA, sep="   ", engine='python')
for column in df:
    columns.append(df[column].tolist())
    c_dic[column] = df[column].tolist()

然后你将得到所有列的列表

for x in columns:
    print x

Returns

['A', 'B', 'C']
[1, 2, 2]
[4, 1, 1]
[3, 2, 3]
[1, 6, 5]

和

for k,v in c_dic.iteritems():
    print k,v

returns

S3 [3, 2, 3]
S2 [4, 1, 1]
NAME ['A', 'B', 'C']
S1 [1, 2, 2]
S4 [1, 6, 5]

如果您需要跟踪列名和数据

在 Python 中列出文件中的每一列

Make a list of every column in a file in Python

python

list

multiple-columns

问题 1：

问题 2：

样本：