检索 JSON 查询结果后代码中断

Question

我已尝试解决此问题，但在 Python 中引发错误后，（对我而言）无法继续下一步。

我正在查询这个网站：https://w.wiki/msg 我通过更改每个循环的城市来调整查询，城市在 [listElements] 内。当我有一个像“Awaradam”这样的城市时，代码会中断。（你基本上可以硬编码它而不是 listElement）

尝试在里面放一个睡眠计时器并没有解决问题（我想我经常尝试请求）。

错误如下：

Traceback (most recent call last):
  File "C:/Users/xxx/PycharmProjects/pythonProject3/xxx.py", line 30, in <module>
    data = r.json()
  File "C:\ProgramData\Anaconda3\envs\pythonProject3\lib\site-packages\requests\models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\pythonProject3\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "C:\ProgramData\Anaconda3\envs\pythonProject3\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\ProgramData\Anaconda3\envs\pythonProject3\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

代码（我编辑了它，所以它可以复制，到目前为止，有这样的代码是没有意义的，在一定的循环之后它只是中断）：

 import requests
listPops = [[], []]
url = 'https://query.wikidata.org/sparql'
zaehler = -1
for i in range(100):
    zaehler = zaehler + 1
    #print(str(listElements[1][i]))
    #query = r"SELECT ?population WHERE { SERVICE wikibase:mwapi {bd:serviceParam mwapi:search '" + str(listElements[1][i]) + "' . bd:serviceParam mwapi:language 'en' . bd:serviceParam wikibase:api 'EntitySearch' . bd:serviceParam wikibase:endpoint 'www.wikidata.org' . bd:serviceParam wikibase:limit 1 . ?item wikibase:apiOutputItem mwapi:item .} ?item wdt:P1082 ?population} "
    query = """ SELECT ?population WHERE { SERVICE wikibase:mwapi {
          bd:serviceParam mwapi:search '""" + "Awaradam" + """'.    
          bd:serviceParam mwapi:language "en" . 
          bd:serviceParam wikibase:api "EntitySearch" .
          bd:serviceParam wikibase:endpoint "www.wikidata.org" .
          bd:serviceParam wikibase:limit 1 .
          ?item wikibase:apiOutputItem mwapi:item .
      }
      ?item wdt:P1082 ?population
    }
    """
    r = requests.get(url, params={'format': 'json', 'query': query}, timeout=10)
    #time.sleep(5)
    data = r.json()
    try:
        #population = r['results']['bindings'][0]['population']['value']
        if data['results']['bindings'][0]['population']['value']:
            population = data['results']['bindings'][0]['population']['value']
            print(str(zaehler) + ": " + "Population in " + str(listElements[1][i]) + ": " + f"{int(population):,}")
            listPops[0].append(str(listElements[1][i]))
            listPops[1].append(population)
    except:
        continue

print('Finished scrape.')

Answer 1

回溯意味着你得到的结果不是JSON。你不能让远程服务器发送 JSON 如果它不想，但你可以跳过这个项目（或者尝试不同的查询，如果你能找出一个可行的）。

try:
    data = r.json()
except json.decoder.JSONDecodeError as err:
    logging.warning('Not JSON: %s (result %r)', err, r.text)
    continue

您将不得不 import logging（或者只是 print 警告）和 import json 如果您还没有这样做的话。

你的毯子 try / except 也可以工作（只需将 try 移到失败线上方），但它真的很糟糕。参见 Why is "except: pass" a bad programming practice?。在实践中，它掩盖了维基数据中没有 Awaradam 的结果这一事实，而你是运行一个无果而终的循环，试图一次又一次地获取它们。

这里有一个快速而肮脏的修复方法：

import requests
import time
import json

listPops = [[], []]
listElements = [[], ['Bangalore', 'Hyderabad', 'Awaradam', 'Rawalpindi']]
url = 'https://query.wikidata.org/sparql'

for i, city in enumerate(listElements[1]):
    query = """ SELECT ?population WHERE { SERVICE wikibase:mwapi {
          bd:serviceParam mwapi:search '""" + city + """'.    
          bd:serviceParam mwapi:language "en" . 
          bd:serviceParam wikibase:api "EntitySearch" .
          bd:serviceParam wikibase:endpoint "www.wikidata.org" .
          bd:serviceParam wikibase:limit 1 .
          ?item wikibase:apiOutputItem mwapi:item .
      }
      ?item wdt:P1082 ?population
    }
    """
    r = requests.get(url, params={'format': 'json', 'query': query}, timeout=10)
    time.sleep(5)
    try:
        data = r.json()
    except json.decoder.JSONDecodeError as err:
        print('Not JSON: %s (result %r)' % (err, r.text))
    assert 'results' in data
    assert 'bindings' in data['results']
    if not data['results']['bindings']:
        #logging.warning('No results for %s', city)
        print('No results for', city)
        continue
    assert data['results']['bindings'], 'type %s %r' % (type(data['results']['bindings']), data['results']['bindings'])
    assert 'population' in data['results']['bindings'][0]
    assert 'value' in data['results']['bindings'][0]['population']
    if data['results']['bindings'][0]['population']['value']:
        population = data['results']['bindings'][0]['population']['value']
        print(f"{i}: Population in {city}: {int(population):,}")
        listPops[0].append(str(listElements[1][i]))
        listPops[1].append(population)

Answer 2

正如@tripleee 所提到的，问题是您查询的不是 return 有效的 JSON（而是 return HTML 消息）。服务器应在您的查询 status 时通知您。要处理它，您应该检查请求的状态：

r = requests.get(url, params={'format': 'json', 'query': query}, timeout=10)
if r.status_code != 200:
  handle_your_error(r)

例如，在运行你的例子之后我得到了 HTTP 错误 429：请求太多。

检索 JSON 查询结果后代码中断

Interruption of code after retrieving JSON query-result

python

json

web-crawler