根据现有列自动生成新列

Question

我想根据 pandas 数据框中的其他列创建新列，并添加一些逻辑。对于名称为 [string] > Full Name 的每一列，我想生成两个名为 [string] > Asset Type 和 [字符串] > 域。还有一些列没有这个 [string] > Full Name 结构，那些需要保持不变。

这是我的资料：

data = pd.DataFrame({'recipient > Full Name': {0: 'Norway', 1: 'Sweden'},
                    'transporter > Full Name': {0: "UPS", 1: "Sweden Mail Services"},
                    'Description': {0:'Priority mail', 1: 'Fragile object - be careful'}})

这就是我想要的：

wantedData = pd.DataFrame({'recipient > Full Name': {0: 'Norway', 1: 'Sweden'},
                    'transporter > Full Name': {0: "UPS", 1: "Sweden Mail Services"},
                    'Description': {0:'Priority mail', 1: 'Fragile object - be careful'},
                    'recipient > Asset Type': {0: "Country", 1: "Country"},
                    'recipient > Domain': {0: "Transport", 1: "Transport"},
                    'transporter > Asset Type': {0: "Legal Enitity", 1: "Legal Entity"},
                    'transporter > Domain': {0: "Transport", 1: "Transport"}})

此外，所有 Domain 列的所有行都具有相同的值，有没有办法用我在代码中使用的示例“Transport”自动填充它？

我尝试创建一些代码来查看第 0 列并根据第 0 列创建第 1 列和第 2 列 - 并遍历所有列，但这会弄乱我想保持不变的列。

Answer 1

import pandas as pd

data = pd.DataFrame({'recipient > Full Name': {0: 'Norway', 1: 'Sweden'},
                'transporter > Full Name': {0: "UPS", 1: "Sweden Mail Services"},
                'Description': {0:'Priority mail', 1: 'Fragile object - be careful'}})

# value_dict contains initial values for new columns (except those ending with Domain) 
value_dict = {
    'recipient > Asset Type' : 'Country',
    'transporter > Asset Type' : 'Legal Entity'
}

# key_list contains combination of new columns - in this case we want to create 
# 2 new columns (... Asset Type, ... Domain) for each column containing '>' 
key_list = ['Asset Type', 'Domain']

# iterate through list of column names containing '>' (other columns remain untouched)
for col in [col for col in data.columns if '>' in col]:
    # and for each such column create 2 new columns with new names (could be more or less...depends on key_list)
    for i in range(len(key_list)):
        new_colname = '{} > {}'.format(col.split(' >')[0], key_list[i%len(key_list)])
        # set Transport as value if column ends with '> Domain' or value from value_dict or None if not specified
        new_value = 'Transport' if new_colname.endswith('> Domain') else value_dict[new_colname] if new_colname in value_dict else None 
        data[new_colname] = new_value

输出：

根据现有列自动生成新列

Auto-generation of new columns based on existing columns

python

automation

transform

pandas