python pandas: 如何让单元格的第一部分成为列名,单元格的第二部分成为列值
python pandas: how to make the first part of a cell the column name, and the second part of a cell the column value
我有一个df
,下面是第一行的示例:
sample_df.to_dict()
{'Disease_and_concern_0': {1: 'skin irritation/allergies/damage+Moderate Concern'},
'Disease_and_concern_1': {1: 'developmental/endocrine/reproductive effects+Some Concern'},
'Disease_and_concern_2': {1: 'damage to vision+Some Concern'}}
sample_df = pd.DataFrame(sample_df)
对于每一列,我想在 + 之前获取字符串的第一部分,并将其作为列名。 + 之后的字符串的第二部分应该是单元格值。
我想要的输出:
skin irritation/allergies/damage developmental/endoctrine/reproductive effects damage to vision
0 Moderate Concern Some Concern Some Concern
我认为有一个简单的解决方案,我已经尝试了一段时间但没有成功。关于如何实现这一点有什么想法吗?
谢谢。
直接操作字典
import pandas as pd
from collections import defaultdict
data = {
"Disease_and_concern_0": {1: "skin irritation/allergies/damage+Moderate Concern"},
"Disease_and_concern_1": {
1: "developmental/endocrine/reproductive effects+Some Concern"
},
"Disease_and_concern_2": {1: "damage to vision+Some Concern"},
}
result = defaultdict(dict)
for key, value in data.items():
for idx, d in value.items():
col, v = d.split('+')
result[idx][col] = v
df = pd.DataFrame.from_dict(result, orient='index')
IIUC,你可以试试str.split
:
df = sample_df.apply(lambda s: s.str.split('+').str[1])
df.columns = sample_df.iloc[0].str.split('+').str[0].tolist()
或者您可以 stack
数据帧,然后 split
围绕分隔符 +
并使用 unstack
重塑:
s = sample_df.stack().str.split('+')
df = s.str[1].droplevel(1).to_frame().set_index(s.str[0], append=True)[0].unstack()
结果:
skin irritation/allergies/damage developmental/endocrine/reproductive effects damage to vision
1 Moderate Concern Some Concern Some Concern
我有一个df
,下面是第一行的示例:
sample_df.to_dict()
{'Disease_and_concern_0': {1: 'skin irritation/allergies/damage+Moderate Concern'},
'Disease_and_concern_1': {1: 'developmental/endocrine/reproductive effects+Some Concern'},
'Disease_and_concern_2': {1: 'damage to vision+Some Concern'}}
sample_df = pd.DataFrame(sample_df)
对于每一列,我想在 + 之前获取字符串的第一部分,并将其作为列名。 + 之后的字符串的第二部分应该是单元格值。
我想要的输出:
skin irritation/allergies/damage developmental/endoctrine/reproductive effects damage to vision
0 Moderate Concern Some Concern Some Concern
我认为有一个简单的解决方案,我已经尝试了一段时间但没有成功。关于如何实现这一点有什么想法吗?
谢谢。
直接操作字典
import pandas as pd
from collections import defaultdict
data = {
"Disease_and_concern_0": {1: "skin irritation/allergies/damage+Moderate Concern"},
"Disease_and_concern_1": {
1: "developmental/endocrine/reproductive effects+Some Concern"
},
"Disease_and_concern_2": {1: "damage to vision+Some Concern"},
}
result = defaultdict(dict)
for key, value in data.items():
for idx, d in value.items():
col, v = d.split('+')
result[idx][col] = v
df = pd.DataFrame.from_dict(result, orient='index')
IIUC,你可以试试str.split
:
df = sample_df.apply(lambda s: s.str.split('+').str[1])
df.columns = sample_df.iloc[0].str.split('+').str[0].tolist()
或者您可以 stack
数据帧,然后 split
围绕分隔符 +
并使用 unstack
重塑:
s = sample_df.stack().str.split('+')
df = s.str[1].droplevel(1).to_frame().set_index(s.str[0], append=True)[0].unstack()
结果:
skin irritation/allergies/damage developmental/endocrine/reproductive effects damage to vision
1 Moderate Concern Some Concern Some Concern