嵌套 JSON 数组到 Python Pandas DataFrame
Nested JSON Array to Python Pandas DataFrame
我正在尝试扩展 pandas 数据框中的嵌套 json 数组。
那是 JSON 我有:
[ {
"id": "0001",
"name": "Stiven",
"location": [{
"country": "Colombia",
"department": "Chocó",
"city": "Quibdó"
}, {
"country": "Colombia",
"department": "Antioquia",
"city": "Medellin"
}, {
"country": "Colombia",
"department": "Cundinamarca",
"city": "Bogotá"
}
]
}, {
"id": "0002",
"name": "Jhon Jaime",
"location": [{
"country": "Colombia",
"department": "Valle del Cauca",
"city": "Cali"
}, {
"country": "Colombia",
"department": "Putumayo",
"city": "Mocoa"
}, {
"country": "Colombia",
"department": "Arauca",
"city": "Arauca"
}
]
}, {
"id": "0003",
"name": "Francisco",
"location": [{
"country": "Colombia",
"department": "Atlántico",
"city": "Barranquilla"
}, {
"country": "Colombia",
"department": "Bolívar",
"city": "Cartagena"
}, {
"country": "Colombia",
"department": "La Guajira",
"city": "Riohacha"
}
]
}
]
这是我的数据框:
index id name location
0 0001 Stiven [{'country':'Colombia', 'department': 'Chocó', 'city': 'Quibdó'}, {'country':'Colombia', 'department': 'Antioquia', 'city': 'Medellin'}, {'country':'Colombia', 'department': 'Cundinamarca', 'city': 'Bogotá'}]
1 0002 Jhon Jaime [{'country':'Colombia', 'department': 'Valle del Cauca', 'city': 'Cali'}, {'country':'Colombia', 'department': 'Putumayo', 'city': 'Mocoa'}, {'country':'Colombia', 'department': 'Arauca', 'city': 'Arauca'}]
2 0003 Francisco [{'country':'Colombia', 'department': 'Atlántico', 'city': 'Barranquilla'}, {'country':'Colombia', 'department': 'Bolívar', 'city': 'Cartagena'}, {'country':'Colombia', 'department': 'La Guajira', 'city': 'Riohacha'}]
我需要将每个 id 转换为数据帧,如下所示:
index id name country department city
0 0001 Stiven Colombia Chocó Quibdó
1 0001 Stiven Colombia Antioquia Medellin
2 0001 Stiven Colombia Cundinamarca Bogotá
3 0002 Jhon Jaime Colombia Valle del Cauca Cali
4 0002 Jhon Jaime Colombia Putumayo Mocoa
5 0002 Jhon Jaime Colombia Arauca Arauca
6 0003 Francisco Colombia Atlántico Barranquilla
7 0003 Francisco Colombia Bolívar Cartagena
8 0003 Francisco Colombia La Guajira Riohacha
提前致谢。
- 如果
JSON
是从文件加载的,请使用 json.loads
,但如果 JSON
直接来自 API,则可能没有必要。
- 使用
pandas.json_normalize
和 meta
参数,将 JSON
转换为 DataFrame。
import pandas as pd
from pathlib import Path
import json
# path to file
p = Path(r'c:\path_to_file\test.json')
# read json
with p.open('r', encoding='utf-8') as f:
data = json.loads(f.read())
# create dataframe
df = pd.json_normalize(data, record_path='location', meta=['id', 'name'])
# output
country department city id name
Colombia Chocó Quibdó 0001 Stiven
Colombia Antioquia Medellin 0001 Stiven
Colombia Cundinamarca Bogotá 0001 Stiven
Colombia Valle del Cauca Cali 0002 Jhon Jaime
Colombia Putumayo Mocoa 0002 Jhon Jaime
Colombia Arauca Arauca 0002 Jhon Jaime
Colombia Atlántico Barranquilla 0003 Francisco
Colombia Bolívar Cartagena 0003 Francisco
Colombia La Guajira Riohacha 0003 Francisco
我正在尝试扩展 pandas 数据框中的嵌套 json 数组。
那是 JSON 我有:
[ {
"id": "0001",
"name": "Stiven",
"location": [{
"country": "Colombia",
"department": "Chocó",
"city": "Quibdó"
}, {
"country": "Colombia",
"department": "Antioquia",
"city": "Medellin"
}, {
"country": "Colombia",
"department": "Cundinamarca",
"city": "Bogotá"
}
]
}, {
"id": "0002",
"name": "Jhon Jaime",
"location": [{
"country": "Colombia",
"department": "Valle del Cauca",
"city": "Cali"
}, {
"country": "Colombia",
"department": "Putumayo",
"city": "Mocoa"
}, {
"country": "Colombia",
"department": "Arauca",
"city": "Arauca"
}
]
}, {
"id": "0003",
"name": "Francisco",
"location": [{
"country": "Colombia",
"department": "Atlántico",
"city": "Barranquilla"
}, {
"country": "Colombia",
"department": "Bolívar",
"city": "Cartagena"
}, {
"country": "Colombia",
"department": "La Guajira",
"city": "Riohacha"
}
]
}
]
这是我的数据框:
index id name location
0 0001 Stiven [{'country':'Colombia', 'department': 'Chocó', 'city': 'Quibdó'}, {'country':'Colombia', 'department': 'Antioquia', 'city': 'Medellin'}, {'country':'Colombia', 'department': 'Cundinamarca', 'city': 'Bogotá'}]
1 0002 Jhon Jaime [{'country':'Colombia', 'department': 'Valle del Cauca', 'city': 'Cali'}, {'country':'Colombia', 'department': 'Putumayo', 'city': 'Mocoa'}, {'country':'Colombia', 'department': 'Arauca', 'city': 'Arauca'}]
2 0003 Francisco [{'country':'Colombia', 'department': 'Atlántico', 'city': 'Barranquilla'}, {'country':'Colombia', 'department': 'Bolívar', 'city': 'Cartagena'}, {'country':'Colombia', 'department': 'La Guajira', 'city': 'Riohacha'}]
我需要将每个 id 转换为数据帧,如下所示:
index id name country department city
0 0001 Stiven Colombia Chocó Quibdó
1 0001 Stiven Colombia Antioquia Medellin
2 0001 Stiven Colombia Cundinamarca Bogotá
3 0002 Jhon Jaime Colombia Valle del Cauca Cali
4 0002 Jhon Jaime Colombia Putumayo Mocoa
5 0002 Jhon Jaime Colombia Arauca Arauca
6 0003 Francisco Colombia Atlántico Barranquilla
7 0003 Francisco Colombia Bolívar Cartagena
8 0003 Francisco Colombia La Guajira Riohacha
提前致谢。
- 如果
JSON
是从文件加载的,请使用json.loads
,但如果JSON
直接来自 API,则可能没有必要。 - 使用
pandas.json_normalize
和meta
参数,将JSON
转换为 DataFrame。
import pandas as pd
from pathlib import Path
import json
# path to file
p = Path(r'c:\path_to_file\test.json')
# read json
with p.open('r', encoding='utf-8') as f:
data = json.loads(f.read())
# create dataframe
df = pd.json_normalize(data, record_path='location', meta=['id', 'name'])
# output
country department city id name
Colombia Chocó Quibdó 0001 Stiven
Colombia Antioquia Medellin 0001 Stiven
Colombia Cundinamarca Bogotá 0001 Stiven
Colombia Valle del Cauca Cali 0002 Jhon Jaime
Colombia Putumayo Mocoa 0002 Jhon Jaime
Colombia Arauca Arauca 0002 Jhon Jaime
Colombia Atlántico Barranquilla 0003 Francisco
Colombia Bolívar Cartagena 0003 Francisco
Colombia La Guajira Riohacha 0003 Francisco