如何根据一系列组数据项转置csv文件数据

How to transpose the csv file data based on series of group data items

需要有关 python 转置相关代码的帮助。看下面的一个 csv 文件 (a1&b1,a2&b2,c1&c2,d1&d2 像组)

r1;a1;b1;a2;b2;c1;c2;d1;d2
12;ec;em;ed;em;ec;ek;;
23;ec;ek;ec;rk;;;;
34;ec;rk;ec;jk;ek;45;;

需要获取输出文件为

12;ec;em
12;ed;em
12;ec;ek
23;ec;ek
23;ec;rk
34;ec;rk
34;ec;jk
34;ek;45

我将尝试提出一个简单的示例。我试图让它尽可能明确和模块化,例如它很容易看到并可以随意更改。

import csv


def flatten(t):
    return [item for sublist in t for item in sublist]


def reformat_first_with_pairs(p_list):
    reformated = []
    for i in range(1, len(p_list), 2):
        reformated.append([p_list[0], p_list[i], p_list[i + 1]])
    return reformated


def disqualify_if_contains(p_list, value):
    filtered = []
    for key, sub_list in enumerate(p_list):
        if not value in sub_list:
            filtered.append(sub_list)
    return filtered


def rows_from_csv_file(file_name, f_delimiter=';'):
    input_file_rows = []
    try:
        with open(file_name, newline='') as csvfile:
            reader = csv.reader(csvfile)
            for row in reader:
                input_file_rows.append(row[0].split(f_delimiter))
    except EnvironmentError:
        raise
    return input_file_rows


def rows_to_csv_file(file_name, rows, f_delimiter=';'):
    try:
        with open(file_name, 'w+', newline='') as csvfile:
            writer = csv.writer(csvfile, delimiter=f_delimiter,
                                quotechar='|', quoting=csv.QUOTE_MINIMAL)
            for row in rows:
                writer.writerow(row)
            return True
    except EnvironmentError:
        raise


def custom_csv_reformat_00(input_file, output_file):
    try:
        input_file_rows = rows_from_csv_file(input_file)
    except EnvironmentError as e:
        raise

    input_file_rows.pop(0)

    reformated_list = flatten(map(reformat_first_with_pairs, input_file_rows))
    reformated_list = disqualify_if_contains(reformated_list, '')

    try:
        rows_to_csv_file(output_file, reformated_list)
    except EnvironmentError as e:
        raise


IF_NAME = 'input.csv'
OF_NAME = 'output.csv'

try:
    custom_csv_reformat_00(IF_NAME, OF_NAME)
except Exception as e:
    print(e)

再次指出,我不知道您的目标是否是删除第一行,如:

input_file_rows.pop(0)

一种选择是基于列成对地遍历数据帧,然后连接成一个数据帧:


temp = df.set_index('r1')
outcome = [temp.iloc[:, n:n+2]
               .set_axis(['left', 'right'], axis = 'columns')
           for n in range(0, len(temp.columns), 2)]

pd.concat(outcome).dropna().sort_index().reset_index()

   r1 left right
0  12   ec    em
1  12   ed    em
2  12   ec    ek
3  23   ec    ek
4  23   ec    rk
5  34   ec    rk
6  34   ec    jk
7  34   ek    45

一个可能更快的选择是将所有内容转储到 numpy 中,在返回 Pandas 清理之前重塑:

temp = df.set_index('r1')

# since the column reshape is in pairs, we use 2:
outcome = np.reshape(temp.to_numpy(), (-1, 2))

# index will be repeated for the pairs of columns:
index = temp.index.repeat(len(temp.columns)//2)

pd.DataFrame(outcome, index = index).sort_index().dropna().reset_index()

   r1   0   1
0  12  ec  em
1  12  ed  em
2  12  ec  ek
3  23  ec  ek
4  23  ec  rk
5  34  ec  rk
6  34  ec  jk
7  34  ek  45

另一个选项是 pivot_longer from pyjanitor,您可以在其中传递整形所需的模式:

(
df
.pivot_longer(index = 'r1', 
              names_to = ['left', 'right'], 
              names_pattern = ["a1|a2|c1|d1", "b1|b2|c2|d2"], 
              sort_by_appearance = True)
.dropna()
)

    r1 left right
0   12   ec    em
1   12   ed    em
2   12   ec    ek
4   23   ec    ek
5   23   ec    rk
8   34   ec    rk
9   34   ec    jk
10  34   ek    45