Python - 使用嵌套列转换嵌套 JSON
Python - Converted Nested JSON with Nested Columns
我是 Python 和 JSON 数据结构的新手,正在寻求一些帮助
我已经能够创建一些调用 Web API 的 Python 代码,并成功地将返回的 JSON 数据 (report_rows) 转换为数据帧 json_normalize()
我在将 JSON 列名转换和排序为数据框列名时遇到一些问题,想知道我是否可以在以下方面获得一些帮助...
- 从 JSON 数据中获取列名 - 在数据框中,我想将列名:c1、c2、c3 等转换为 RECORD_NO, REF_RECORD_NO, SOV_LINEITEM_NO。列名在 JSON 数据 [data][report_header][cXX][name] 中,其中 cXX 是列号
- 对列名称进行排序 - 我想对数据框列进行排序,而不是 c1、c10、c11、c12、c2、c3 等,而是 c1、c2、c3 ... c10, c11,c12
如果有人能够提供一些帮助,将不胜感激
提前致谢
Python代码
json_data = json.loads(res.read())
data = pd.json_normalize(json_data['data'], record_path=['report_row'])
print(data)
输出如下
c1 c10 c11 ... c7 c8 c9
0 CON-0000001 71 VEN-0000001 ... Build IT System Contract 123 Pending
1 CON-0000002 72 VEN-0000002 ... Build IT System Contract XYZ Approved
JSON数据
"data": [
{
"report_header": {
"c11": {
"name": "VENDOR_RECORD",
"type": "java.lang.String"
},
"c10": {
"name": "VENDOR_ID",
"type": "java.lang.Integer"
},
"c12": {
"name": "VENDOR_NAME",
"type": "java.lang.String"
},
"c1": {
"name": "RECORD_NO",
"type": "java.lang.String"
},
"c2": {
"name": "REF_RECORD_NO",
"type": "java.lang.String"
},
"c3": {
"name": "SOV_LINEITEM_NO",
"type": "java.lang.String"
},
"c4": {
"name": "REF_ITEM",
"type": "java.lang.String"
},
"c5": {
"name": "PROJECTNUMBER",
"type": "java.lang.String"
},
"c6": {
"name": "PROJECTNAME",
"type": "java.lang.String"
},
"c7": {
"name": "TITLE",
"type": "java.lang.String"
},
"c8": {
"name": "CONTRACT_NO",
"type": "java.lang.String"
},
"c9": {
"name": "STATUS",
"type": "java.lang.String"
}
},
"report_row": [
{
"c1": "CON-0000001",
"c10": "71 ",
"c11": "VEN-0000001",
"c12": "Microsoft",
"c2": "",
"c3": "1",
"c4": "",
"c5": "P-0037",
"c6": "Project ABC",
"c7": "Build IT System",
"c8": "Contract 123",
"c9": "Pending"
},
{
"c1": "CON-0000002",
"c10": "72 ",
"c11": "VEN-0000002",
"c12": "Google",
"c2": "",
"c3": "1.1",
"c4": "",
"c5": "P-0037",
"c6": "Project ABC",
"c7": "Build IT System",
"c8": "Contract XYZ",
"c9": "Approved"
}
]
}
],
"message": [
"OK"
],
"status": 200
}
我能够通过添加以下代码解决问题...
# Get the number of fields/columns in the JSON data
number_of_fields = len((json_data['data'][0]['report_header']))
reorder_columns = []
new_column_names = []
field_index = 0
# Loop through the Columns and do the following...
# reorder_columns - this is the column order that i want: c1, c2, c3 ... c10, c11, c12
# new_column_name - this will retrieve the column names from the header: c1.name, c2.name, etc
while field_index < number_of_fields:
field_index += 1
new_column = "c" + str(field_index)
reorder_columns.append(new_column)
column_header = new_column + '.name'
new_column_name = header.iloc[0][new_column + '.name']
new_column_names.append(new_column_name)
data = pd.json_normalize(json_data['data'], record_path=['report_row'])
data = data.reindex(columns=reorder_columns)
data.columns = new_column_names
我是 Python 和 JSON 数据结构的新手,正在寻求一些帮助
我已经能够创建一些调用 Web API 的 Python 代码,并成功地将返回的 JSON 数据 (report_rows) 转换为数据帧 json_normalize()
我在将 JSON 列名转换和排序为数据框列名时遇到一些问题,想知道我是否可以在以下方面获得一些帮助...
- 从 JSON 数据中获取列名 - 在数据框中,我想将列名:c1、c2、c3 等转换为 RECORD_NO, REF_RECORD_NO, SOV_LINEITEM_NO。列名在 JSON 数据 [data][report_header][cXX][name] 中,其中 cXX 是列号
- 对列名称进行排序 - 我想对数据框列进行排序,而不是 c1、c10、c11、c12、c2、c3 等,而是 c1、c2、c3 ... c10, c11,c12
如果有人能够提供一些帮助,将不胜感激
提前致谢
Python代码
json_data = json.loads(res.read())
data = pd.json_normalize(json_data['data'], record_path=['report_row'])
print(data)
输出如下
c1 c10 c11 ... c7 c8 c9
0 CON-0000001 71 VEN-0000001 ... Build IT System Contract 123 Pending
1 CON-0000002 72 VEN-0000002 ... Build IT System Contract XYZ Approved
JSON数据
"data": [
{
"report_header": {
"c11": {
"name": "VENDOR_RECORD",
"type": "java.lang.String"
},
"c10": {
"name": "VENDOR_ID",
"type": "java.lang.Integer"
},
"c12": {
"name": "VENDOR_NAME",
"type": "java.lang.String"
},
"c1": {
"name": "RECORD_NO",
"type": "java.lang.String"
},
"c2": {
"name": "REF_RECORD_NO",
"type": "java.lang.String"
},
"c3": {
"name": "SOV_LINEITEM_NO",
"type": "java.lang.String"
},
"c4": {
"name": "REF_ITEM",
"type": "java.lang.String"
},
"c5": {
"name": "PROJECTNUMBER",
"type": "java.lang.String"
},
"c6": {
"name": "PROJECTNAME",
"type": "java.lang.String"
},
"c7": {
"name": "TITLE",
"type": "java.lang.String"
},
"c8": {
"name": "CONTRACT_NO",
"type": "java.lang.String"
},
"c9": {
"name": "STATUS",
"type": "java.lang.String"
}
},
"report_row": [
{
"c1": "CON-0000001",
"c10": "71 ",
"c11": "VEN-0000001",
"c12": "Microsoft",
"c2": "",
"c3": "1",
"c4": "",
"c5": "P-0037",
"c6": "Project ABC",
"c7": "Build IT System",
"c8": "Contract 123",
"c9": "Pending"
},
{
"c1": "CON-0000002",
"c10": "72 ",
"c11": "VEN-0000002",
"c12": "Google",
"c2": "",
"c3": "1.1",
"c4": "",
"c5": "P-0037",
"c6": "Project ABC",
"c7": "Build IT System",
"c8": "Contract XYZ",
"c9": "Approved"
}
]
}
],
"message": [
"OK"
],
"status": 200
}
我能够通过添加以下代码解决问题...
# Get the number of fields/columns in the JSON data
number_of_fields = len((json_data['data'][0]['report_header']))
reorder_columns = []
new_column_names = []
field_index = 0
# Loop through the Columns and do the following...
# reorder_columns - this is the column order that i want: c1, c2, c3 ... c10, c11, c12
# new_column_name - this will retrieve the column names from the header: c1.name, c2.name, etc
while field_index < number_of_fields:
field_index += 1
new_column = "c" + str(field_index)
reorder_columns.append(new_column)
column_header = new_column + '.name'
new_column_name = header.iloc[0][new_column + '.name']
new_column_names.append(new_column_name)
data = pd.json_normalize(json_data['data'], record_path=['report_row'])
data = data.reindex(columns=reorder_columns)
data.columns = new_column_names