如何展平多个嵌套 json 并转换为数据帧?

How to flatten multiple nested json and convert to dataframe?

我正在尝试将 JSON(来自 AirTable)转换为可用于进一步数据 t运行sform 的数据框。

我 运行 在我将 JSON 转换为 列中的一个值具有嵌套列表的数据帧后成为问题。

这是我展开 w/o 后的示例数据框,意识到 "Package" 包含来自其原始 JSON 列表的嵌套列表。


|                    | Name                 |Source                                     |
| -------------------| ---------------------|-------------------------------------------|
|rec2mxAycpaC93jfz   | Luis Downes          |[Canceled - Lv1]                           |
|recIQ0HfCmRhUclti   | Milana Whitehouse    |[Canceled - Lv1,2019 - Lv2,2020 - Lv1]     |
|recOFVz0eajFblTzL   | Fatma Mayo           |[Canceled - Lv1,2019 - Lv4,2020 - Lv2]     |

这是示例 JSON, 是具有嵌套列表的数据字段,我想将其展平。

[{'id': 'rec2mxAycpaC93jfz',
 'fields': {'Name': 'Luis Downes',
             'Package': ['Canceled - Lv1']},
 'createdTime': '2017-08-25T17:05:45.000Z'},
{'id': 'recIQ0HfCmRhUclti',
 'fields': {'Name': 'Milana Whitehouse',
             Package': ['Canceled - Lv1', '2019 - Lv2', '2020 - Lv1']},
 'createdTime': '2017-08-25T17:05:46.000Z'},
{'id': 'recOFVz0eajFblTzL',
 'fields': {'Name': 'Fatma Mayo',
            Package': ['Canceled - Lv1', '2019 - Lv4', '2020 - Lv2']},
 'createdTime': '2017-08-25T17:05:47.000Z'}]
]

知道如何平整整个 JSON 吗?我已经尝试了几个我发现的解决方案,包括 this one 但它只会将第一条记录压平成一行。

# flattening JSON objects of arbitrary structure

def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

下面列出了我想要实现的最终结果(JSON 或数据帧)


|                    | Name                 |Package- Canceled - Lv1 |Package- 2019 - Lv2 |Package- 2020 - Lv1 |Package- 2019 - Lv4 |Package- 2020 - Lv2 |                                          |
| -------------------| ---------------------|------------------------|--------------------|--------------------|--------------------|--------------------|
|rec2mxAycpaC93jfz   | Luis Downes          |1                       |0                   |0                   |0                   |0                   |
|recIQ0HfCmRhUclti   | Milana Whitehouse    |1                       |1                   |1                   |0                   |0                   |
|recOFVz0eajFblTzL   | Fatma Mayo           |1                       |0                   |0                   |1                   |1                   |

在此先感谢您的帮助!

通过 json_normalize()get_dummies():

d = [{'id': 'rec2mxAycpaC93jfz',
 'fields': {'Name': 'Luis Downes',
             'Package': ['Canceled - Lv1']},
 'createdTime': '2017-08-25T17:05:45.000Z'},
{'id': 'recIQ0HfCmRhUclti',
 'fields': {'Name': 'Milana Whitehouse',
             'Package': ['Canceled - Lv1', '2019 - Lv2', '2020 - Lv1']},
 'createdTime': '2017-08-25T17:05:46.000Z'},
{'id': 'recOFVz0eajFblTzL',
 'fields': {'Name': 'Fatma Mayo',
            'Package': ['Canceled - Lv1', '2019 - Lv4', '2020 - Lv2']},
 'createdTime': '2017-08-25T17:05:47.000Z'}
]
 
df = pd.json_normalize(d)
dm = pd.get_dummies(df['fields.Package'].apply(pd.Series).stack()).sum(level=0)
pd.concat([df[['id','fields.Name']],dm], axis=1) 

                  id        fields.Name  2019 - Lv2  2019 - Lv4  2020 - Lv1  \
0  rec2mxAycpaC93jfz        Luis Downes           0           0           0   
1  recIQ0HfCmRhUclti  Milana Whitehouse           1           0           1   
2  recOFVz0eajFblTzL         Fatma Mayo           0           1           0   

   2020 - Lv2  Canceled - Lv1  
0           0               1  
1           0               1  
2           1               1