使用 psycopg 将 json 中的嵌套字段从 REST API 解析到 PostgreSQL
Parse nested fields in json from REST API into PostgreSQL with psycopg
我正在 python 中的一个项目中工作,我使用请求从 REST API 获得 json,然后继续将其加载到 PostgreSQL 数据库中。 json API 响应的结构如下所示。我的主要问题是关于嵌套字段,如 location.lat
和嵌套在 info
数组中的字段,如 info[0].value
.
{
"items": [
{
"id": 300436,
"item_id_parent": null,
"reference": "",
"subreference1": "CAMS\/1",
"subreference2": "CAMS\/1",
"reference_alpha": null,
"reference_numeric": null,
"oid": "CAMS\/1",
"code": null,
"code_custom": null,
"name": "284",
"image": "https:\/\/static.inventsys.com.br\/278\/thumb\/f-3298886-200x200c.jpg",
"situations": [],
"project_id": 10762,
"project": {
"id": 10762,
"name": "Fauna EGR",
"color": null
},
"category_id": 20685,
"category": {
"id": 20685,
"name": "EGR FAUNA - Armadilhas"
},
"area_id": null,
"area": null,
"location": {
"lat": -30.136699676514,
"lng": -50.910511016846,
"address": {
"region": "RS",
"city": "Viamão",
"district": null,
"zipcode": null,
"street": "Rodovia Tapir Rocha",
"street_number": null,
"desc": null,
"full": "Rodovia Tapir Rocha Viamão \/ BR"
}
},
"event_last": null,
"description": null,
"search_terms": "CAMS\/1 284 Fauna EGR EGR FAUNA - Armadilhas Rodovia Tapir Rocha Viamão \/ BR",
"info": [
{
"id": 42725,
"name": "Observacoes",
"type": "longtext",
"value": null,
"fvalue": null,
"description": null,
"ikey": null,
"group": "Fauna EGR",
"preload": false,
"filling": false,
"primary": false,
"created": null
},
{
"id": 44542,
"name": "Data de instalacao",
"type": "date",
"value": null,
"fvalue": null,
"description": null,
"ikey": null,
"group": "Fauna EGR",
"preload": false,
"filling": false,
"primary": false,
"created": null
},...
到目前为止,我知道我需要命名字段并为我的 table 中的每个变量建立一个占位符,这就是我目前所拥有的:
import psycopg2
conn = psycopg2.connect("host=localhost dbname=postgres user=guilhermeiablonovski") cur = conn.cursor()
dbarmadilhas = """DROP TABLE IF EXISTS egrfauna_armadilhas; CREATE UNLOGGED TABLE IF NOT EXISTS egrfauna_armadilhas( id text PRIMARY KEY, name text, image text, category_id integer, category_name text, latitude real, longitude real, imagem_orig text, observacoes text, instalacao DATE, IDcartao text, IDcamera text, IDbueiro text, estrada text, foto_armadilha text, gps_lat real, gps_long real, gps_alt real, gps_acc real);"""
cur.execute(dbarmadilhas) conn.commit()
armadilha_fields = [
'id',
'name',
'image',
'category_id',
'category.name',
'location.lat',
'location.lng',
'files[0].url_orig',
'files[1].url_low',
'info[0].value',
'info[1].value',
'info[2].value',
'info[3].value',
'info[4].value',
'info[5].value',
'info[6].value',
'info[7].value',
'info[8].value',
'info[9].value',
'info[10].value',
'info[11].value' ]
for item in registros:
my_data = [item[field] for field in armadilha_fields]
# need a placeholder (%s) for each variable
# refer to postgres docs on INSERT statement on how to specify order
cur.execute("INSERT INTO egrfauna_armadilhas VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", tuple(my_data))
conn.commit()
这里的问题是,如果我 运行 只有前四个变量的代码,它工作正常,所以我想当涉及到我要解析的嵌套字段时,我的语法完全错误。我怎样才能最好地引用这些字段?
提前谢谢大家!
为了以后参考,这样解决了:
for item in registros:
armadilhafields = [
item['id'],
item['name'],
item['image'],
item['category_id'],
item['category']['name'],
item['location']['lat'],
item['location']['lng'],
item['files'][0]['url_orig'],
item['info'][0]['value'],
item['info'][1]['value'],
item['info'][2]['value'],
item['info'][3]['value'],
item['info'][4]['value'],
item['info'][5]['value'],
item['info'][6]['value'],
item['info'][7]['value'],
item['info'][8]['value'],
item['info'][9]['value'],
item['info'][10]['value'],
]
my_data = [field for field in armadilhafields]
cur.execute("INSERT INTO egrfauna_armadilhas VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", tuple(my_data))
conn.commit()
我正在 python 中的一个项目中工作,我使用请求从 REST API 获得 json,然后继续将其加载到 PostgreSQL 数据库中。 json API 响应的结构如下所示。我的主要问题是关于嵌套字段,如 location.lat
和嵌套在 info
数组中的字段,如 info[0].value
.
{
"items": [ { "id": 300436, "item_id_parent": null, "reference": "", "subreference1": "CAMS\/1", "subreference2": "CAMS\/1", "reference_alpha": null, "reference_numeric": null, "oid": "CAMS\/1", "code": null, "code_custom": null, "name": "284", "image": "https:\/\/static.inventsys.com.br\/278\/thumb\/f-3298886-200x200c.jpg", "situations": [], "project_id": 10762, "project": { "id": 10762, "name": "Fauna EGR", "color": null }, "category_id": 20685, "category": { "id": 20685, "name": "EGR FAUNA - Armadilhas" }, "area_id": null, "area": null, "location": { "lat": -30.136699676514, "lng": -50.910511016846, "address": { "region": "RS", "city": "Viamão", "district": null, "zipcode": null, "street": "Rodovia Tapir Rocha", "street_number": null, "desc": null, "full": "Rodovia Tapir Rocha Viamão \/ BR" } }, "event_last": null, "description": null, "search_terms": "CAMS\/1 284 Fauna EGR EGR FAUNA - Armadilhas Rodovia Tapir Rocha Viamão \/ BR", "info": [ { "id": 42725, "name": "Observacoes", "type": "longtext", "value": null, "fvalue": null, "description": null, "ikey": null, "group": "Fauna EGR", "preload": false, "filling": false, "primary": false, "created": null }, { "id": 44542, "name": "Data de instalacao", "type": "date", "value": null, "fvalue": null, "description": null, "ikey": null, "group": "Fauna EGR", "preload": false, "filling": false, "primary": false, "created": null },...
到目前为止,我知道我需要命名字段并为我的 table 中的每个变量建立一个占位符,这就是我目前所拥有的:
import psycopg2
conn = psycopg2.connect("host=localhost dbname=postgres user=guilhermeiablonovski") cur = conn.cursor()
dbarmadilhas = """DROP TABLE IF EXISTS egrfauna_armadilhas; CREATE UNLOGGED TABLE IF NOT EXISTS egrfauna_armadilhas( id text PRIMARY KEY, name text, image text, category_id integer, category_name text, latitude real, longitude real, imagem_orig text, observacoes text, instalacao DATE, IDcartao text, IDcamera text, IDbueiro text, estrada text, foto_armadilha text, gps_lat real, gps_long real, gps_alt real, gps_acc real);"""
cur.execute(dbarmadilhas) conn.commit()
armadilha_fields = [
'id',
'name',
'image',
'category_id',
'category.name',
'location.lat',
'location.lng',
'files[0].url_orig',
'files[1].url_low',
'info[0].value',
'info[1].value',
'info[2].value',
'info[3].value',
'info[4].value',
'info[5].value',
'info[6].value',
'info[7].value',
'info[8].value',
'info[9].value',
'info[10].value',
'info[11].value' ]
for item in registros:
my_data = [item[field] for field in armadilha_fields]
# need a placeholder (%s) for each variable
# refer to postgres docs on INSERT statement on how to specify order
cur.execute("INSERT INTO egrfauna_armadilhas VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", tuple(my_data))
conn.commit()
这里的问题是,如果我 运行 只有前四个变量的代码,它工作正常,所以我想当涉及到我要解析的嵌套字段时,我的语法完全错误。我怎样才能最好地引用这些字段?
提前谢谢大家!
为了以后参考,这样解决了:
for item in registros:
armadilhafields = [
item['id'],
item['name'],
item['image'],
item['category_id'],
item['category']['name'],
item['location']['lat'],
item['location']['lng'],
item['files'][0]['url_orig'],
item['info'][0]['value'],
item['info'][1]['value'],
item['info'][2]['value'],
item['info'][3]['value'],
item['info'][4]['value'],
item['info'][5]['value'],
item['info'][6]['value'],
item['info'][7]['value'],
item['info'][8]['value'],
item['info'][9]['value'],
item['info'][10]['value'],
]
my_data = [field for field in armadilhafields]
cur.execute("INSERT INTO egrfauna_armadilhas VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", tuple(my_data))
conn.commit()