根据现有列自动生成新列
Auto-generation of new columns based on existing columns
我想根据 pandas 数据框中的其他列创建新列,并添加一些逻辑。对于名称为 [string] > Full Name 的每一列,我想生成两个名为 [string] > Asset Type 和 [字符串] > 域。还有一些列没有这个 [string] > Full Name 结构,那些需要保持不变。
这是我的资料:
data = pd.DataFrame({'recipient > Full Name': {0: 'Norway', 1: 'Sweden'},
'transporter > Full Name': {0: "UPS", 1: "Sweden Mail Services"},
'Description': {0:'Priority mail', 1: 'Fragile object - be careful'}})
这就是我想要的:
wantedData = pd.DataFrame({'recipient > Full Name': {0: 'Norway', 1: 'Sweden'},
'transporter > Full Name': {0: "UPS", 1: "Sweden Mail Services"},
'Description': {0:'Priority mail', 1: 'Fragile object - be careful'},
'recipient > Asset Type': {0: "Country", 1: "Country"},
'recipient > Domain': {0: "Transport", 1: "Transport"},
'transporter > Asset Type': {0: "Legal Enitity", 1: "Legal Entity"},
'transporter > Domain': {0: "Transport", 1: "Transport"}})
此外,所有 Domain 列的所有行都具有相同的值,有没有办法用我在代码中使用的示例“Transport”自动填充它?
我尝试创建一些代码来查看第 0 列并根据第 0 列创建第 1 列和第 2 列 - 并遍历所有列,但这会弄乱我想保持不变的列。
import pandas as pd
data = pd.DataFrame({'recipient > Full Name': {0: 'Norway', 1: 'Sweden'},
'transporter > Full Name': {0: "UPS", 1: "Sweden Mail Services"},
'Description': {0:'Priority mail', 1: 'Fragile object - be careful'}})
# value_dict contains initial values for new columns (except those ending with Domain)
value_dict = {
'recipient > Asset Type' : 'Country',
'transporter > Asset Type' : 'Legal Entity'
}
# key_list contains combination of new columns - in this case we want to create
# 2 new columns (... Asset Type, ... Domain) for each column containing '>'
key_list = ['Asset Type', 'Domain']
# iterate through list of column names containing '>' (other columns remain untouched)
for col in [col for col in data.columns if '>' in col]:
# and for each such column create 2 new columns with new names (could be more or less...depends on key_list)
for i in range(len(key_list)):
new_colname = '{} > {}'.format(col.split(' >')[0], key_list[i%len(key_list)])
# set Transport as value if column ends with '> Domain' or value from value_dict or None if not specified
new_value = 'Transport' if new_colname.endswith('> Domain') else value_dict[new_colname] if new_colname in value_dict else None
data[new_colname] = new_value
输出:
我想根据 pandas 数据框中的其他列创建新列,并添加一些逻辑。对于名称为 [string] > Full Name 的每一列,我想生成两个名为 [string] > Asset Type 和 [字符串] > 域。还有一些列没有这个 [string] > Full Name 结构,那些需要保持不变。
这是我的资料:
data = pd.DataFrame({'recipient > Full Name': {0: 'Norway', 1: 'Sweden'},
'transporter > Full Name': {0: "UPS", 1: "Sweden Mail Services"},
'Description': {0:'Priority mail', 1: 'Fragile object - be careful'}})
这就是我想要的:
wantedData = pd.DataFrame({'recipient > Full Name': {0: 'Norway', 1: 'Sweden'},
'transporter > Full Name': {0: "UPS", 1: "Sweden Mail Services"},
'Description': {0:'Priority mail', 1: 'Fragile object - be careful'},
'recipient > Asset Type': {0: "Country", 1: "Country"},
'recipient > Domain': {0: "Transport", 1: "Transport"},
'transporter > Asset Type': {0: "Legal Enitity", 1: "Legal Entity"},
'transporter > Domain': {0: "Transport", 1: "Transport"}})
此外,所有 Domain 列的所有行都具有相同的值,有没有办法用我在代码中使用的示例“Transport”自动填充它?
我尝试创建一些代码来查看第 0 列并根据第 0 列创建第 1 列和第 2 列 - 并遍历所有列,但这会弄乱我想保持不变的列。
import pandas as pd
data = pd.DataFrame({'recipient > Full Name': {0: 'Norway', 1: 'Sweden'},
'transporter > Full Name': {0: "UPS", 1: "Sweden Mail Services"},
'Description': {0:'Priority mail', 1: 'Fragile object - be careful'}})
# value_dict contains initial values for new columns (except those ending with Domain)
value_dict = {
'recipient > Asset Type' : 'Country',
'transporter > Asset Type' : 'Legal Entity'
}
# key_list contains combination of new columns - in this case we want to create
# 2 new columns (... Asset Type, ... Domain) for each column containing '>'
key_list = ['Asset Type', 'Domain']
# iterate through list of column names containing '>' (other columns remain untouched)
for col in [col for col in data.columns if '>' in col]:
# and for each such column create 2 new columns with new names (could be more or less...depends on key_list)
for i in range(len(key_list)):
new_colname = '{} > {}'.format(col.split(' >')[0], key_list[i%len(key_list)])
# set Transport as value if column ends with '> Domain' or value from value_dict or None if not specified
new_value = 'Transport' if new_colname.endswith('> Domain') else value_dict[new_colname] if new_colname in value_dict else None
data[new_colname] = new_value
输出: