如何在 Pandas 嵌套 DataFrame 中合并或连接数据
How merge or join data in a Pandas nested DataFrame
我正在尝试了解如何对 DataFrame 中的嵌套字段执行合并或联接。以下是一些示例数据:
df_all_groups = pd.read_json("""
[
{
"object": "group",
"id": "group-one",
"collections": [
{
"id": "111-111-111",
"readOnly": false
},
{
"id": "222-222-222",
"readOnly": false
}
]
},
{
"object": "group",
"id": "group-two",
"collections": [
{
"id": "111-111-111",
"readOnly": false
},
{
"id": "333-333-333",
"readOnly": false
}
]
}
]
""")
df_collections_with_names = pd.read_json("""
[
{
"object": "collection",
"id": "111-111-111",
"externalId": null,
"name": "Cats"
},
{
"object": "collection",
"id": "222-222-222",
"externalId": null,
"name": "Dogs"
},
{
"object": "collection",
"id": "333-333-333",
"externalId": null,
"name": "Fish"
}
]
""")
我正在尝试通过加入 df_all_groups['collections'][<index>].id
将 df_collections_with_names
中的 name
字段添加到每个 df_all_groups['collections'][<index>]
我试图获得的输出是:
[
{
"object": "group",
"id": "group-one",
"collections": [
{
"id": "111-111-111",
"readOnly": false,
"name": "Cats" // See Collection name was added
},
{
"id": "222-222-222",
"readOnly": false,
"name": "Dogs" // See Collection name was added
}
]
},
{
"object": "group",
"id": "group-two",
"collections": [
{
"id": "111-111-111",
"readOnly": false,
"name": "Cats" // See Collection name was added
},
{
"id": "333-333-333",
"readOnly": false,
"name": "Fish" // See Collection name was added
}
]
}
]
我试过使用 merge
方法,但似乎无法在 collections
嵌套字段上将其设置为 运行,因为我认为这是一个系列点.
这是一种方法:
先把构造df_all_groups
的json字符串(我这里命名为all_groups
)用json.loads
转成字典。然后用json_normalize
用它构造一个DataFrame。
然后merge
上面用df_collections_with_names
构建的DataFrame;我们现在有“姓名”栏了。
剩下的就是根据上面得到的结果构建想要的字典; groupby
+ apply(to_dict)
+ reset_index
+ to_dict
将获取所需的结果:
import json
out = (pd.json_normalize(json.loads(all_groups), ['collections'], ['object', 'id'], meta_prefix='_')
.merge(df_collections_with_names, on='id', suffixes=('','_'))
.drop(columns=['object','externalId']))
out = (out.groupby(['_object','_id']).apply(lambda x: x[['id','readOnly','name']].to_dict('records'))
.reset_index(name='collections'))
out.rename(columns={c: c.strip('_') for c in out.columns}).to_dict('records')
输出:
[{'object': 'group',
'id': 'group-one',
'collections': [{'id': '111-111-111', 'readOnly': False, 'name': 'Cats'},
{'id': '222-222-222', 'readOnly': False, 'name': 'Dogs'}]},
{'object': 'group',
'id': 'group-two',
'collections': [{'id': '111-111-111', 'readOnly': False, 'name': 'Cats'},
{'id': '333-333-333', 'readOnly': False, 'name': 'Fish'}]}]
我正在尝试了解如何对 DataFrame 中的嵌套字段执行合并或联接。以下是一些示例数据:
df_all_groups = pd.read_json("""
[
{
"object": "group",
"id": "group-one",
"collections": [
{
"id": "111-111-111",
"readOnly": false
},
{
"id": "222-222-222",
"readOnly": false
}
]
},
{
"object": "group",
"id": "group-two",
"collections": [
{
"id": "111-111-111",
"readOnly": false
},
{
"id": "333-333-333",
"readOnly": false
}
]
}
]
""")
df_collections_with_names = pd.read_json("""
[
{
"object": "collection",
"id": "111-111-111",
"externalId": null,
"name": "Cats"
},
{
"object": "collection",
"id": "222-222-222",
"externalId": null,
"name": "Dogs"
},
{
"object": "collection",
"id": "333-333-333",
"externalId": null,
"name": "Fish"
}
]
""")
我正在尝试通过加入 df_all_groups['collections'][<index>].id
将 df_collections_with_names
中的 name
字段添加到每个 df_all_groups['collections'][<index>]
我试图获得的输出是:
[
{
"object": "group",
"id": "group-one",
"collections": [
{
"id": "111-111-111",
"readOnly": false,
"name": "Cats" // See Collection name was added
},
{
"id": "222-222-222",
"readOnly": false,
"name": "Dogs" // See Collection name was added
}
]
},
{
"object": "group",
"id": "group-two",
"collections": [
{
"id": "111-111-111",
"readOnly": false,
"name": "Cats" // See Collection name was added
},
{
"id": "333-333-333",
"readOnly": false,
"name": "Fish" // See Collection name was added
}
]
}
]
我试过使用 merge
方法,但似乎无法在 collections
嵌套字段上将其设置为 运行,因为我认为这是一个系列点.
这是一种方法:
先把构造df_all_groups
的json字符串(我这里命名为all_groups
)用json.loads
转成字典。然后用json_normalize
用它构造一个DataFrame。
然后merge
上面用df_collections_with_names
构建的DataFrame;我们现在有“姓名”栏了。
剩下的就是根据上面得到的结果构建想要的字典; groupby
+ apply(to_dict)
+ reset_index
+ to_dict
将获取所需的结果:
import json
out = (pd.json_normalize(json.loads(all_groups), ['collections'], ['object', 'id'], meta_prefix='_')
.merge(df_collections_with_names, on='id', suffixes=('','_'))
.drop(columns=['object','externalId']))
out = (out.groupby(['_object','_id']).apply(lambda x: x[['id','readOnly','name']].to_dict('records'))
.reset_index(name='collections'))
out.rename(columns={c: c.strip('_') for c in out.columns}).to_dict('records')
输出:
[{'object': 'group',
'id': 'group-one',
'collections': [{'id': '111-111-111', 'readOnly': False, 'name': 'Cats'},
{'id': '222-222-222', 'readOnly': False, 'name': 'Dogs'}]},
{'object': 'group',
'id': 'group-two',
'collections': [{'id': '111-111-111', 'readOnly': False, 'name': 'Cats'},
{'id': '333-333-333', 'readOnly': False, 'name': 'Fish'}]}]