如何拆分一个多索引数据框,其中包含一列充满不同键的字典
How do I split out a multi-index dataframe with a column full of dictionaries with different keys
我有一个数据框如下(它有多个变量,但我只关心将字典列变成一个单独的数据框)
| Index | Attributes | Day | Colour|
| -------- | -------------- | ---- |-------|
| Alpha | {A1: 1, A2: 2} | Mon |Black |
| Bravo | {A1: 3, B1: 4} | Mon |Red |
| Charlie | {C1: 5, A2: 6} | Mon |Yellow|
我只想要前两个变量,如何将其拆分成这样
| Index | A1 | A2 | B1 | C1|
| -------- | ---- | ---- |----|----|
| Alpha |1 |2 |N/A |N/A |
| Bravo |3 |N/A |4 |N/A |
| Charlie |N/A |6 |N/A |5 |
我真的被这个问题难住了,这是我尝试的代码:
new_df = pd.DataFrame(columns = ['Index'])
new_df['Index'] = old_df['Index'
attribute_df = pd.Dataframe(old_df['attributes'])
new_df = pd.concat(new_df, attribute_df)
没用!
假设列 Index
实际上是框架的索引 use apply
pd.Series
:
new_df = df['Attributes'].apply(pd.Series)
A1 A2 B2 C1
Index
Alpha 1.0 2.0 NaN NaN
Bravo 3.0 NaN 4.0 NaN
Charlie NaN 6.0 NaN 5.0
假设 Index
是一个列,添加一个 join
以合并回 DataFrame(使用此选项还可以保存比索引更多的列):
new_df = df[['Index']].join(df['Attributes'].apply(pd.Series))
Index A1 A2 B2 C1
0 Alpha 1.0 2.0 NaN NaN
1 Bravo 3.0 NaN 4.0 NaN
2 Charlie NaN 6.0 NaN 5.0
完整的工作示例:
import pandas as pd
df = pd.DataFrame({
'Index': ['Alpha', 'Bravo', 'Charlie'],
'Attributes': [{'A1': 1, 'A2': 2}, {'A1': 3, 'B2': 4}, {'C1': 5, 'A2': 6}],
'Day': ['Mon', 'Mon', 'Mon'],
'Colour': ['Black', 'Red', 'Yellow']
}).set_index('Index')
new_df = df['Attributes'].apply(pd.Series)
print(new_df)
我有一个数据框如下(它有多个变量,但我只关心将字典列变成一个单独的数据框)
| Index | Attributes | Day | Colour|
| -------- | -------------- | ---- |-------|
| Alpha | {A1: 1, A2: 2} | Mon |Black |
| Bravo | {A1: 3, B1: 4} | Mon |Red |
| Charlie | {C1: 5, A2: 6} | Mon |Yellow|
我只想要前两个变量,如何将其拆分成这样
| Index | A1 | A2 | B1 | C1|
| -------- | ---- | ---- |----|----|
| Alpha |1 |2 |N/A |N/A |
| Bravo |3 |N/A |4 |N/A |
| Charlie |N/A |6 |N/A |5 |
我真的被这个问题难住了,这是我尝试的代码:
new_df = pd.DataFrame(columns = ['Index'])
new_df['Index'] = old_df['Index'
attribute_df = pd.Dataframe(old_df['attributes'])
new_df = pd.concat(new_df, attribute_df)
没用!
假设列 Index
实际上是框架的索引 use apply
pd.Series
:
new_df = df['Attributes'].apply(pd.Series)
A1 A2 B2 C1
Index
Alpha 1.0 2.0 NaN NaN
Bravo 3.0 NaN 4.0 NaN
Charlie NaN 6.0 NaN 5.0
假设 Index
是一个列,添加一个 join
以合并回 DataFrame(使用此选项还可以保存比索引更多的列):
new_df = df[['Index']].join(df['Attributes'].apply(pd.Series))
Index A1 A2 B2 C1
0 Alpha 1.0 2.0 NaN NaN
1 Bravo 3.0 NaN 4.0 NaN
2 Charlie NaN 6.0 NaN 5.0
完整的工作示例:
import pandas as pd
df = pd.DataFrame({
'Index': ['Alpha', 'Bravo', 'Charlie'],
'Attributes': [{'A1': 1, 'A2': 2}, {'A1': 3, 'B2': 4}, {'C1': 5, 'A2': 6}],
'Day': ['Mon', 'Mon', 'Mon'],
'Colour': ['Black', 'Red', 'Yellow']
}).set_index('Index')
new_df = df['Attributes'].apply(pd.Series)
print(new_df)