Python 嵌套 Json 中的请求 Post - 检索具有特定值的数据
Python Requests Post within a nested Json - retrieve data with a specific value
我已经查看了 Whosebug,但找不到问题的答案。
我正在访问来自德国政府的 API,它的输出限制为 10.000 个条目。我想要来自特定城市的所有数据,并且由于原始数据库中有超过 10.000 个条目,所以我需要在执行 requests.post
.
时“执行查询”
这是 Json 结果的一个条目,当我简单地对 API 执行 request.post 时:
{
"results":[
{
"_id":"CXPTYYFY807",
"CREATED_AT":"2019-12-17T14:48:17.130Z",
"UPDATED_AT":"2019-12-17T14:48:17.130Z",
"result":{
"id":"CXPTYYFY807",
"title":"Bundesstadt Bonn, SGB-315114, Ortsteilzentrum Brüser Berg, Fliesenarbeiten",
"description":["SGB-315114","Ortsteilzentrum Brüser Berg, Fliesenarbeiten"],
"procedure_type":"Ex ante Veröffentlichung (§ 19 Abs. 5)",
"order_type":"VOB",
"publication_date":"",
"cpv_codes":["45431000-7","45431100-8"],
"buyer":{
"name":"Bundesstadt Bonn, Referat Vergabedienste",
"address":"Berliner Platz 2",
"town":"Bonn",
"postal_code":"53111"},
"seller":{
"name":"",
"town":"",
"country":""
},
"geo":{
"lon":7.0944,
"lat":50.73657
},
"value":"",
"CREATED_AT":"2019-12-17T14:48:17.130Z",
"UPDATED_AT":"2019-12-17T14:48:17.130Z"}
}
],
"aggregations":{},
"pagination":{
"total":47389,
"start":0,
"end":0 }}
我要的是"town" : "Bonn"
买的所有数据
我已经尝试过的:
import requests
url = 'https://daten.vergabe.nrw.de/rest/evergabe/aggregation_search'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}
data = {"results": [{"result": {"buyer": {"town":"Bonn"}}}]}
#need to put the size limit, otherwise he delivers me less:
params = {'size': 10000}
req = requests.post(url, params=params, headers=headers, json=data)
这个 returns 我 post,但没有按城市“过滤”。
我也试过 req = requests.post(url, params=params, headers=headers, data=data)
, returns 我的错误 400 .
另一种方法是在循环中 json 代码末尾使用分页参数获取所有数据,但同样我无法写下 json 路径到分页,例如:start: 0 , end:500
谁能帮我解决一下?
尝试:
url = 'https://daten.vergabe.nrw.de/rest/evergabe/aggregation_search'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}
query1 = {
"query": {
"match": {
"buyer.town": "Bonn"
}
}
}
req = requests.post(url, headers=headers, json=query1)
# Check the output
req.text
编辑:
如果过滤器匹配的结果超过 10,000 个,这将不起作用,但它可能是解决您面临的问题的快速解决方法。
import json
import requests
import math
url = "https://daten.vergabe.nrw.de/rest/vmp_rheinland"
size = 5000
payload = '{"sort":[{"_id":"asc"}],"query":{"match_all":{}},"size":'+str(size)+'}'
headers = {
'accept': "application/json",
'content-type': "application/json"
'cache-control': "no-cache"
}
response = requests.request("POST", url, data=payload, headers=headers)
tenders_array = []
query_data = json.loads(response.text)
tenders_array.extend(query_data['results'])
total_hits = query_data['pagination']['total']
result_size = len(query_data['results'])
last_id = query_data['results'][-1]["_id"]
number_of_loops = ((total_hits - size) // size )
last_loop_size = ((total_hits - size) % size)
for i in range(number_of_loops+1):
if i == number_of_loops:
size=last_loop_size
payload = '{"sort":[{"_id":"asc"}],"query":{"match_all":{}},"size":'+str(size)+',"search_after":["'+last_id+'"]}'
response = requests.request("POST", url, data=payload, headers=headers)
query_data = json.loads(response.text)
result_size = len(query_data['results'])
if result_size > 0:
tenders_array.extend(query_data['results'])
last_id = query_data['results'][-1]["_id"]
else:
break
https://gist.github.com/thiagoalencar/34401e204358499ea3b9aa043a18395f
要点中的代码。
通过 elasticsearch 进行分页的一些代码 API。这是 elasticsearch API 之上的 API,文档不是很清楚。尝试滚动,没有成功。此解决方案使用没有时间点的 search_after 参数,因为端点不可用。有时服务器会拒绝请求,需要使用 response.status_code==502.
进行验证
代码比较乱,需要重构。但它有效。最后的 tenders_array 包含所有对象。
我已经查看了 Whosebug,但找不到问题的答案。
我正在访问来自德国政府的 API,它的输出限制为 10.000 个条目。我想要来自特定城市的所有数据,并且由于原始数据库中有超过 10.000 个条目,所以我需要在执行 requests.post
.
这是 Json 结果的一个条目,当我简单地对 API 执行 request.post 时:
{
"results":[
{
"_id":"CXPTYYFY807",
"CREATED_AT":"2019-12-17T14:48:17.130Z",
"UPDATED_AT":"2019-12-17T14:48:17.130Z",
"result":{
"id":"CXPTYYFY807",
"title":"Bundesstadt Bonn, SGB-315114, Ortsteilzentrum Brüser Berg, Fliesenarbeiten",
"description":["SGB-315114","Ortsteilzentrum Brüser Berg, Fliesenarbeiten"],
"procedure_type":"Ex ante Veröffentlichung (§ 19 Abs. 5)",
"order_type":"VOB",
"publication_date":"",
"cpv_codes":["45431000-7","45431100-8"],
"buyer":{
"name":"Bundesstadt Bonn, Referat Vergabedienste",
"address":"Berliner Platz 2",
"town":"Bonn",
"postal_code":"53111"},
"seller":{
"name":"",
"town":"",
"country":""
},
"geo":{
"lon":7.0944,
"lat":50.73657
},
"value":"",
"CREATED_AT":"2019-12-17T14:48:17.130Z",
"UPDATED_AT":"2019-12-17T14:48:17.130Z"}
}
],
"aggregations":{},
"pagination":{
"total":47389,
"start":0,
"end":0 }}
我要的是"town" : "Bonn"
我已经尝试过的:
import requests
url = 'https://daten.vergabe.nrw.de/rest/evergabe/aggregation_search'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}
data = {"results": [{"result": {"buyer": {"town":"Bonn"}}}]}
#need to put the size limit, otherwise he delivers me less:
params = {'size': 10000}
req = requests.post(url, params=params, headers=headers, json=data)
这个 returns 我 post,但没有按城市“过滤”。
我也试过 req = requests.post(url, params=params, headers=headers, data=data)
, returns 我的错误 400 .
另一种方法是在循环中 json 代码末尾使用分页参数获取所有数据,但同样我无法写下 json 路径到分页,例如:start: 0 , end:500
谁能帮我解决一下?
尝试:
url = 'https://daten.vergabe.nrw.de/rest/evergabe/aggregation_search'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}
query1 = {
"query": {
"match": {
"buyer.town": "Bonn"
}
}
}
req = requests.post(url, headers=headers, json=query1)
# Check the output
req.text
编辑: 如果过滤器匹配的结果超过 10,000 个,这将不起作用,但它可能是解决您面临的问题的快速解决方法。
import json
import requests
import math
url = "https://daten.vergabe.nrw.de/rest/vmp_rheinland"
size = 5000
payload = '{"sort":[{"_id":"asc"}],"query":{"match_all":{}},"size":'+str(size)+'}'
headers = {
'accept': "application/json",
'content-type': "application/json"
'cache-control': "no-cache"
}
response = requests.request("POST", url, data=payload, headers=headers)
tenders_array = []
query_data = json.loads(response.text)
tenders_array.extend(query_data['results'])
total_hits = query_data['pagination']['total']
result_size = len(query_data['results'])
last_id = query_data['results'][-1]["_id"]
number_of_loops = ((total_hits - size) // size )
last_loop_size = ((total_hits - size) % size)
for i in range(number_of_loops+1):
if i == number_of_loops:
size=last_loop_size
payload = '{"sort":[{"_id":"asc"}],"query":{"match_all":{}},"size":'+str(size)+',"search_after":["'+last_id+'"]}'
response = requests.request("POST", url, data=payload, headers=headers)
query_data = json.loads(response.text)
result_size = len(query_data['results'])
if result_size > 0:
tenders_array.extend(query_data['results'])
last_id = query_data['results'][-1]["_id"]
else:
break
https://gist.github.com/thiagoalencar/34401e204358499ea3b9aa043a18395f 要点中的代码。
通过 elasticsearch 进行分页的一些代码 API。这是 elasticsearch API 之上的 API,文档不是很清楚。尝试滚动,没有成功。此解决方案使用没有时间点的 search_after 参数,因为端点不可用。有时服务器会拒绝请求,需要使用 response.status_code==502.
进行验证代码比较乱,需要重构。但它有效。最后的 tenders_array 包含所有对象。