Python - 使用嵌套列转换嵌套 JSON

Python - Converted Nested JSON with Nested Columns

我是 Python 和 JSON 数据结构的新手,正在寻求一些帮助

我已经能够创建一些调用 Web API 的 Python 代码,并成功地将返回的 JSON 数据 (report_rows) 转换为数据帧 json_normalize()

我在将 JSON 列名转换和排序为数据框列名时遇到一些问题,想知道我是否可以在以下方面获得一些帮助...

  1. 从 JSON 数据中获取列名 - 在数据框中,我想将列名:c1、c2、c3 等转换为 RECORD_NO, REF_RECORD_NO, SOV_LINEITEM_NO。列名在 JSON 数据 [data][report_header][cXX][name] 中,其中 cXX 是列号
  2. 对列名称进行排序 - 我想对数据框列进行排序,而不是 c1、c10、c11、c12、c2、c3 等,而是 c1、c2、c3 ... c10, c11,c12

如果有人能够提供一些帮助,将不胜感激

提前致谢

Python代码

json_data = json.loads(res.read())
data = pd.json_normalize(json_data['data'], record_path=['report_row'])
print(data)

输出如下

            c1 c10            c11  ...               c7            c8        c9
0  CON-0000001  71    VEN-0000001  ...  Build IT System  Contract 123   Pending
1  CON-0000002  72    VEN-0000002  ...  Build IT System  Contract XYZ  Approved

JSON数据

  "data": [
    {
      "report_header": {
        "c11": {
          "name": "VENDOR_RECORD",
          "type": "java.lang.String"
        },
        "c10": {
          "name": "VENDOR_ID",
          "type": "java.lang.Integer"
        },
        "c12": {
          "name": "VENDOR_NAME",
          "type": "java.lang.String"
        },
        "c1": {
          "name": "RECORD_NO",
          "type": "java.lang.String"
        },
        "c2": {
          "name": "REF_RECORD_NO",
          "type": "java.lang.String"
        },
        "c3": {
          "name": "SOV_LINEITEM_NO",
          "type": "java.lang.String"
        },
        "c4": {
          "name": "REF_ITEM",
          "type": "java.lang.String"
        },
        "c5": {
          "name": "PROJECTNUMBER",
          "type": "java.lang.String"
        },
        "c6": {
          "name": "PROJECTNAME",
          "type": "java.lang.String"
        },
        "c7": {
          "name": "TITLE",
          "type": "java.lang.String"
        },
        "c8": {
          "name": "CONTRACT_NO",
          "type": "java.lang.String"
        },
        "c9": {
          "name": "STATUS",
          "type": "java.lang.String"
        }
      },
      "report_row": [
        {
          "c1": "CON-0000001",
          "c10": "71  ",
          "c11": "VEN-0000001",
          "c12": "Microsoft",
          "c2": "",
          "c3": "1",
          "c4": "",
          "c5": "P-0037",
          "c6": "Project ABC",
          "c7": "Build IT System",
          "c8": "Contract 123",
          "c9": "Pending"
        },
        {
          "c1": "CON-0000002",
          "c10": "72  ",
          "c11": "VEN-0000002",
          "c12": "Google",
          "c2": "",
          "c3": "1.1",
          "c4": "",
          "c5": "P-0037",
          "c6": "Project ABC",
          "c7": "Build IT System",
          "c8": "Contract XYZ",
          "c9": "Approved"
        }
      ]
    }
  ],
  "message": [
    "OK"
  ],
  "status": 200
}

我能够通过添加以下代码解决问题...

# Get the number of fields/columns in the JSON data
number_of_fields = len((json_data['data'][0]['report_header']))
reorder_columns = []
new_column_names = []
field_index = 0

# Loop through the Columns and do the following...
# reorder_columns - this is the column order that i want: c1, c2, c3 ... c10, c11, c12
# new_column_name - this will retrieve the column names from the header: c1.name, c2.name, etc
while field_index < number_of_fields:
    field_index += 1
    new_column = "c" + str(field_index)
    reorder_columns.append(new_column) 
    column_header = new_column + '.name'
    new_column_name = header.iloc[0][new_column + '.name']
    new_column_names.append(new_column_name)

data = pd.json_normalize(json_data['data'], record_path=['report_row'])
data = data.reindex(columns=reorder_columns)
data.columns = new_column_names