如何展平多个嵌套 json 并转换为数据帧?
How to flatten multiple nested json and convert to dataframe?
我正在尝试将 JSON(来自 AirTable)转换为可用于进一步数据 t运行sform 的数据框。
我 运行 在我将 JSON 转换为 列中的一个值具有嵌套列表的数据帧后成为问题。
这是我展开 w/o 后的示例数据框,意识到 "Package" 包含来自其原始 JSON 列表的嵌套列表。
| | Name |Source |
| -------------------| ---------------------|-------------------------------------------|
|rec2mxAycpaC93jfz | Luis Downes |[Canceled - Lv1] |
|recIQ0HfCmRhUclti | Milana Whitehouse |[Canceled - Lv1,2019 - Lv2,2020 - Lv1] |
|recOFVz0eajFblTzL | Fatma Mayo |[Canceled - Lv1,2019 - Lv4,2020 - Lv2] |
这是示例 JSON,包 是具有嵌套列表的数据字段,我想将其展平。
[{'id': 'rec2mxAycpaC93jfz',
'fields': {'Name': 'Luis Downes',
'Package': ['Canceled - Lv1']},
'createdTime': '2017-08-25T17:05:45.000Z'},
{'id': 'recIQ0HfCmRhUclti',
'fields': {'Name': 'Milana Whitehouse',
Package': ['Canceled - Lv1', '2019 - Lv2', '2020 - Lv1']},
'createdTime': '2017-08-25T17:05:46.000Z'},
{'id': 'recOFVz0eajFblTzL',
'fields': {'Name': 'Fatma Mayo',
Package': ['Canceled - Lv1', '2019 - Lv4', '2020 - Lv2']},
'createdTime': '2017-08-25T17:05:47.000Z'}]
]
知道如何平整整个 JSON 吗?我已经尝试了几个我发现的解决方案,包括 this one 但它只会将第一条记录压平成一行。
# flattening JSON objects of arbitrary structure
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
下面列出了我想要实现的最终结果(JSON 或数据帧)
| | Name |Package- Canceled - Lv1 |Package- 2019 - Lv2 |Package- 2020 - Lv1 |Package- 2019 - Lv4 |Package- 2020 - Lv2 | |
| -------------------| ---------------------|------------------------|--------------------|--------------------|--------------------|--------------------|
|rec2mxAycpaC93jfz | Luis Downes |1 |0 |0 |0 |0 |
|recIQ0HfCmRhUclti | Milana Whitehouse |1 |1 |1 |0 |0 |
|recOFVz0eajFblTzL | Fatma Mayo |1 |0 |0 |1 |1 |
在此先感谢您的帮助!
通过 json_normalize()
和 get_dummies()
:
d = [{'id': 'rec2mxAycpaC93jfz',
'fields': {'Name': 'Luis Downes',
'Package': ['Canceled - Lv1']},
'createdTime': '2017-08-25T17:05:45.000Z'},
{'id': 'recIQ0HfCmRhUclti',
'fields': {'Name': 'Milana Whitehouse',
'Package': ['Canceled - Lv1', '2019 - Lv2', '2020 - Lv1']},
'createdTime': '2017-08-25T17:05:46.000Z'},
{'id': 'recOFVz0eajFblTzL',
'fields': {'Name': 'Fatma Mayo',
'Package': ['Canceled - Lv1', '2019 - Lv4', '2020 - Lv2']},
'createdTime': '2017-08-25T17:05:47.000Z'}
]
df = pd.json_normalize(d)
dm = pd.get_dummies(df['fields.Package'].apply(pd.Series).stack()).sum(level=0)
pd.concat([df[['id','fields.Name']],dm], axis=1)
id fields.Name 2019 - Lv2 2019 - Lv4 2020 - Lv1 \
0 rec2mxAycpaC93jfz Luis Downes 0 0 0
1 recIQ0HfCmRhUclti Milana Whitehouse 1 0 1
2 recOFVz0eajFblTzL Fatma Mayo 0 1 0
2020 - Lv2 Canceled - Lv1
0 0 1
1 0 1
2 1 1
我正在尝试将 JSON(来自 AirTable)转换为可用于进一步数据 t运行sform 的数据框。
我 运行 在我将 JSON 转换为 列中的一个值具有嵌套列表的数据帧后成为问题。
这是我展开 w/o 后的示例数据框,意识到 "Package" 包含来自其原始 JSON 列表的嵌套列表。
| | Name |Source |
| -------------------| ---------------------|-------------------------------------------|
|rec2mxAycpaC93jfz | Luis Downes |[Canceled - Lv1] |
|recIQ0HfCmRhUclti | Milana Whitehouse |[Canceled - Lv1,2019 - Lv2,2020 - Lv1] |
|recOFVz0eajFblTzL | Fatma Mayo |[Canceled - Lv1,2019 - Lv4,2020 - Lv2] |
这是示例 JSON,包 是具有嵌套列表的数据字段,我想将其展平。
[{'id': 'rec2mxAycpaC93jfz',
'fields': {'Name': 'Luis Downes',
'Package': ['Canceled - Lv1']},
'createdTime': '2017-08-25T17:05:45.000Z'},
{'id': 'recIQ0HfCmRhUclti',
'fields': {'Name': 'Milana Whitehouse',
Package': ['Canceled - Lv1', '2019 - Lv2', '2020 - Lv1']},
'createdTime': '2017-08-25T17:05:46.000Z'},
{'id': 'recOFVz0eajFblTzL',
'fields': {'Name': 'Fatma Mayo',
Package': ['Canceled - Lv1', '2019 - Lv4', '2020 - Lv2']},
'createdTime': '2017-08-25T17:05:47.000Z'}]
]
知道如何平整整个 JSON 吗?我已经尝试了几个我发现的解决方案,包括 this one 但它只会将第一条记录压平成一行。
# flattening JSON objects of arbitrary structure
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
下面列出了我想要实现的最终结果(JSON 或数据帧)
| | Name |Package- Canceled - Lv1 |Package- 2019 - Lv2 |Package- 2020 - Lv1 |Package- 2019 - Lv4 |Package- 2020 - Lv2 | |
| -------------------| ---------------------|------------------------|--------------------|--------------------|--------------------|--------------------|
|rec2mxAycpaC93jfz | Luis Downes |1 |0 |0 |0 |0 |
|recIQ0HfCmRhUclti | Milana Whitehouse |1 |1 |1 |0 |0 |
|recOFVz0eajFblTzL | Fatma Mayo |1 |0 |0 |1 |1 |
在此先感谢您的帮助!
通过 json_normalize()
和 get_dummies()
:
d = [{'id': 'rec2mxAycpaC93jfz',
'fields': {'Name': 'Luis Downes',
'Package': ['Canceled - Lv1']},
'createdTime': '2017-08-25T17:05:45.000Z'},
{'id': 'recIQ0HfCmRhUclti',
'fields': {'Name': 'Milana Whitehouse',
'Package': ['Canceled - Lv1', '2019 - Lv2', '2020 - Lv1']},
'createdTime': '2017-08-25T17:05:46.000Z'},
{'id': 'recOFVz0eajFblTzL',
'fields': {'Name': 'Fatma Mayo',
'Package': ['Canceled - Lv1', '2019 - Lv4', '2020 - Lv2']},
'createdTime': '2017-08-25T17:05:47.000Z'}
]
df = pd.json_normalize(d)
dm = pd.get_dummies(df['fields.Package'].apply(pd.Series).stack()).sum(level=0)
pd.concat([df[['id','fields.Name']],dm], axis=1)
id fields.Name 2019 - Lv2 2019 - Lv4 2020 - Lv1 \
0 rec2mxAycpaC93jfz Luis Downes 0 0 0
1 recIQ0HfCmRhUclti Milana Whitehouse 1 0 1
2 recOFVz0eajFblTzL Fatma Mayo 0 1 0
2020 - Lv2 Canceled - Lv1
0 0 1
1 0 1
2 1 1