创建一个新的 DataFrame,将列字典中的每个键添加为 header
Create a new DataFrame adding each key from a column dict as header
我有一个 DataFrame,其中包含带有字典的特定列。
我想在 DataFrame 中为包含字典的列中每个元素上的每个键添加一个新的 header,分配给这些新单元格的每个新值应该对应于 None
如果该元素不包含该 header 键和相应的键值。
以下是用于测试和可视化我所说内容的数据:
正在导入依赖项:
import pandas as pd
import numpy as np
正在创建一个包含内部字典列表的字典:
data = {'string_info': ['User1', 'User2', 'User3'],
'dict_info': [{'elm1': 'attr5', 'elm2': 'attr9', 'elm3': 'attr33'},
{'elm5': 'attr31', 'elm7': 'attr13'},
{'elm5': 'attr28', 'elm1': 'attr23', 'elm2': 'attr33','elm6': 'attr33'}],
'int_info': [4, 24, 31],}
为测试创建合适的初始 DataFrame:
df = pd.DataFrame.from_dict(data)
df
手动说明我想要的输出:
data2 = {'string_info': ['User1', 'User2', 'User3'],
'elm1': ['attr5',None,'attr23'],
'elm2': ['attr9',None,'attr33'],
'elm3': ['attr33',None,None],
'elm4': [None,None,None],
'elm5': [None,'attr31',None],
'elm6': [None,None,'attr33'],
'elm7': [None,None,'attr13'],
'int_info': [4, 24, 31]}
所需的输出将是:
df2 = pd.DataFrame.from_dict(data2)
df2
谢谢!
您可以使用 concat
和 DataFrame
构造函数将 dict
替换为列:
print (pd.DataFrame(df.dict_info.values.tolist()))
elm1 elm2 elm3 elm5 elm6 elm7
0 attr5 attr9 attr33 NaN NaN NaN
1 NaN NaN NaN attr31 NaN attr13
2 attr23 attr33 NaN attr28 attr33 NaN
print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()),
df[['int_info','string_info']]], axis=1))
elm1 elm2 elm3 elm5 elm6 elm7 int_info string_info
0 attr5 attr9 attr33 NaN NaN NaN 4 User1
1 NaN NaN NaN attr31 NaN attr13 24 User2
2 attr23 attr33 NaN attr28 attr33 NaN 31 User3
如果需要 None
s 添加 replace
:
print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()).replace({np.nan:None}),
df[['int_info','string_info']]], axis=1))
elm1 elm2 elm3 elm5 elm6 elm7 int_info string_info
0 attr5 attr9 attr33 None None None 4 User1
1 None None None attr31 None attr13 24 User2
2 attr23 attr33 None attr28 attr33 None 31 User3
我有一个 DataFrame,其中包含带有字典的特定列。
我想在 DataFrame 中为包含字典的列中每个元素上的每个键添加一个新的 header,分配给这些新单元格的每个新值应该对应于 None
如果该元素不包含该 header 键和相应的键值。
以下是用于测试和可视化我所说内容的数据:
正在导入依赖项:
import pandas as pd
import numpy as np
正在创建一个包含内部字典列表的字典:
data = {'string_info': ['User1', 'User2', 'User3'],
'dict_info': [{'elm1': 'attr5', 'elm2': 'attr9', 'elm3': 'attr33'},
{'elm5': 'attr31', 'elm7': 'attr13'},
{'elm5': 'attr28', 'elm1': 'attr23', 'elm2': 'attr33','elm6': 'attr33'}],
'int_info': [4, 24, 31],}
为测试创建合适的初始 DataFrame:
df = pd.DataFrame.from_dict(data)
df
手动说明我想要的输出:
data2 = {'string_info': ['User1', 'User2', 'User3'],
'elm1': ['attr5',None,'attr23'],
'elm2': ['attr9',None,'attr33'],
'elm3': ['attr33',None,None],
'elm4': [None,None,None],
'elm5': [None,'attr31',None],
'elm6': [None,None,'attr33'],
'elm7': [None,None,'attr13'],
'int_info': [4, 24, 31]}
所需的输出将是:
df2 = pd.DataFrame.from_dict(data2)
df2
谢谢!
您可以使用 concat
和 DataFrame
构造函数将 dict
替换为列:
print (pd.DataFrame(df.dict_info.values.tolist()))
elm1 elm2 elm3 elm5 elm6 elm7
0 attr5 attr9 attr33 NaN NaN NaN
1 NaN NaN NaN attr31 NaN attr13
2 attr23 attr33 NaN attr28 attr33 NaN
print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()),
df[['int_info','string_info']]], axis=1))
elm1 elm2 elm3 elm5 elm6 elm7 int_info string_info
0 attr5 attr9 attr33 NaN NaN NaN 4 User1
1 NaN NaN NaN attr31 NaN attr13 24 User2
2 attr23 attr33 NaN attr28 attr33 NaN 31 User3
如果需要 None
s 添加 replace
:
print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()).replace({np.nan:None}),
df[['int_info','string_info']]], axis=1))
elm1 elm2 elm3 elm5 elm6 elm7 int_info string_info
0 attr5 attr9 attr33 None None None 4 User1
1 None None None attr31 None attr13 24 User2
2 attr23 attr33 None attr28 attr33 None 31 User3