将来自 API 调用的深度嵌套 JSON 响应转换为 pandas 数据帧

Converting deeply nested JSON response from an API call to pandas dataframe

我目前在解析来自 HTTP API 调用的深层嵌套 JSON 响应时遇到问题。

我的JSON回复就像

{'took': 476,
 '_revision': 'r08badf3',
 'response': {'accounts': {'hits': [{'name': '4002238760',
     'display_name': 'Googleglass-4002238760',
     'selected_fields': ['Googleglass',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4002238760',
      'DDMadarchod-INSTE',
      None,
      'Googleglass',
      '0001012556',
      'CC',
      'Setu Non Standard',
      '40022387',
      320142,
      4651321321333,
      1324650651651]},
    {'name': '4003893720',
     'display_name': 'Swift-4003893720',
     'selected_fields': ['Swift',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4003893720',
      'DDMadarchod-UPTM-RemotexNBD',
      None,
      'S.W.I.F.T. SCRL',
      '0001000110',
      'SE',
      'Setu Non Standard',
      '40038937',
      189508,
      1464739200000,
      1559260800000]},

收到响应后,我使用 json 规范化

将其存储在数据对象中
data = response.json()
data = data['response']['accounts']['hits']
data = json_normalize(data)

但是在我规范化我的数据框之后看起来像 this

我的 Curl 语句如下所示

curl --data 'query= {"terms":[{"type":"string_attribute","attribute":"Account Type","query_term_id":"account_type","in_list":["Contract"]},{"type":"string","term":"status_group","in_list":["paying"]},{"type":"string_attribute","attribute":"Region","in_list":["DDEU"]},{"type":"string_attribute","attribute":"Country","in_list":["Belgium"]},{"type":"string_attribute","attribute":"CSM Tag","in_list":["EU CSM"]},{"type":"date_attribute","attribute":"Contract Renewal Date","gte":1554057000000,"lte":1561833000000}],"count":1000,"offset":0,"fields":[{"type":"string_attribute","attribute":"DomainName","field_display_name":"Client Name"},{"type":"string_attribute","attribute":"Region","field_display_name":"Region"},{"type":"string_attribute","attribute":"Country","field_display_name":"Country"},{"type":"string_attribute","attribute":"Success Manager","field_display_name":"Client Success Manager"},{"type":"string","term":"identifier","field_display_name":"Account id"},{"type":"string_attribute","attribute":"DeviceSLA","field_display_name":"[FIN] Material Part Number"},{"type":"string_attribute","attribute":"SFDCAccountId","field_display_name":"SFDCAccountId"},{"type":"string_attribute","attribute":"Client","field_display_name":"[FIN] Client Sold-To Name"},{"type":"string_attribute","attribute":"Sold To Code","field_display_name":"[FIN] Client Sold To Code"},{"type":"string_attribute","attribute":"BU","field_display_name":"[FIN] Active BUs"},{"type":"string_attribute","attribute":"Service Type","field_display_name":"[FIN] Service Type"},{"type":"string_attribute","attribute":"Contract Header ID","field_display_name":"[FIN] SAP Contract Header ID"},{"type":"number_attribute","attribute":"Contract Value","field_display_name":"[FIN] ACV - Annual Contract Value","desc":true},{"type":"date_attribute","attribute":"Contract Start Date","field_display_name":"[FIN] Contract Start Date"},{"type":"date_attribute","attribute":"Contract Renewal Date","field_display_name":"[FIN] Contract Renewal Date"}],"scope":"all"}' --header 'app-token:YOUR-TOKEN-HERE' 'https://app.totango.com/api/v1/search/accounts'

所以最终我想将响应与字段名称一起存储在数据框中。

过去我不得不做几次这种事情(弄平嵌套 json)我会解释我的过程,你可以看看它是否有效,或者至少然后可以稍微修改一下代码以满足您的需要。

1) 获取 data 响应,并使用函数将其完全拉平。当我第一次不得不这样做时,这个 blog 非常有帮助。

2) 然后它遍历创建的平面字典,通过嵌套部分中新键名的编号找到需要创建每一行和每一列的位置。还有一些键是 unique/distinct,所以它们没有数字来标识为 "new" 行,所以我在所谓的 special_cols.[=15= 中说明了这些键]

3) 当它遍历这些时,提取指定的行号(嵌入在那些平面键中),然后以这种方式构造数据框。

听起来很复杂,但是如果你调试并逐行运行,你就会明白它是如何工作的。 None-the-less,我相信它应该能满足您的需求。

data = {'took': 476,
 '_revision': 'r08badf3',
 'response': {'accounts': {'hits': [{'name': '4002238760',
     'display_name': 'Googleglass-4002238760',
     'selected_fields': ['Googleglass',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4002238760',
      'DDMadarchod-INSTE',
      None,
      'Googleglass',
      '0001012556',
      'CC',
      'Setu Non Standard',
      '40022387',
      320142,
      4651321321333,
      1324650651651]},
    {'name': '4003893720',
     'display_name': 'Swift-4003893720',
     'selected_fields': ['Swift',
      'DDMonkey',
      'Papu New Guinea',
      'Jonathan Vardharajan',
      '4003893720',
      'DDMadarchod-UPTM-RemotexNBD',
      None,
      'S.W.I.F.T. SCRL',
      '0001000110',
      'SE',
      'Setu Non Standard',
      '40038937',
      189508,
      1464739200000,
      1559260800000]}]}}}


import pandas as pd
import re


def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

flat = flatten_json(data)                      


results = pd.DataFrame()
special_cols = []

columns_list = list(flat.keys())
for item in columns_list:
    try:
        row_idx = re.findall(r'\_(\d+)\_', item )[0]
    except:
        special_cols.append(item)
        continue
    column = re.findall(r'\_\d+\_(.*)', item )[0]
    column = column.replace('_', '')

    row_idx = int(row_idx)
    value = flat[item]

    results.loc[row_idx, column] = value

for item in special_cols:
    results[item] = flat[item]

输出:

print (results.to_string())
         name             displayname selectedfields0 selectedfields1  selectedfields2       selectedfields3 selectedfields4              selectedfields5  selectedfields6  selectedfields7 selectedfields8 selectedfields9   selectedfields10 selectedfields11  selectedfields12  selectedfields13  selectedfields14  took _revision
0  4002238760  Googleglass-4002238760     Googleglass        DDMonkey  Papu New Guinea  Jonathan Vardharajan      4002238760            DDMadarchod-INSTE              NaN      Googleglass      0001012556              CC  Setu Non Standard         40022387          320142.0      4.651321e+12      1.324651e+12   476  r08badf3
1  4003893720        Swift-4003893720           Swift        DDMonkey  Papu New Guinea  Jonathan Vardharajan      4003893720  DDMadarchod-UPTM-RemotexNBD              NaN  S.W.I.F.T. SCRL      0001000110              SE  Setu Non Standard         40038937          189508.0      1.464739e+12      1.559261e+12   476  r08badf3