从 ajax 网站获取响应数据的 python 程序？

Question

请注意，我是编程新手。这些是我在使用 python 学习网络抓取时遇到的问题。我用的网站是https://www.mobikwik.com/（手机、dth、电费在线充值支付网站）但我得到的只是抓取时的 403 响应。然后我明白了，这可能是因为网站使用了ajax。我的 objective 在制作程序时是接收用户输入的手机号码，然后在网站的移动运营商搜索中传递值，页面加载当前运营商和圈子，我想在我的程序中显示. python phonenumber 模块在手机号码被移植到其他运营商时没有用。任何帮助表示赞赏。谢谢。

Answer 1

有两个 xhr 请求，我不确定你想要哪个，所以我都做了。您只需重新创建请求即可。

getconnectiondetails:

scrapy shell

In [1]: phone_number = '9820123456'

In [2]: url = 'https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn='

In [3]: headers = {
   ...: "Accept": "application/json, text/plain, */*",
   ...: "Accept-Encoding": "gzip, deflate, br",
   ...: "Accept-Language": "en-US,en;q=0.5",
   ...: "Cache-Control": "no-cache",
   ...: "Connection": "keep-alive",
   ...: "DNT": "1",
   ...: "Host": "rapi.mobikwik.com",
   ...: "Origin": "https://www.mobikwik.com",
   ...: "Pragma": "no-cache",
   ...: "Referer": "https://www.mobikwik.com/",
   ...: "Sec-Fetch-Dest": "empty",
   ...: "Sec-Fetch-Mode": "cors",
   ...: "Sec-Fetch-Site": "same-site",
   ...: "Sec-GPC": "1",
   ...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
   ...: 9.169 Safari/537.36",
   ...: "X-MClient": "0"
   ...: }

In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)

In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn=9820123456> (referer: https://www.mobikwik.com/)

In [6]: json_data = response.json()

In [7]: json_data['data']['operatorId']
Out[7]: 338

In [8]: json_data['data']['circleId']
Out[8]: 15

recommendedplans:

scrapy shell

In [1]: phone_number = '9820123456'

In [2]: url = 'https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn='

In [3]: headers = {
   ...: "Accept": "application/json, text/plain, */*",
   ...: "Accept-Encoding": "gzip, deflate, br",
   ...: "Accept-Language": "en-US,en;q=0.5",
   ...: "Cache-Control": "no-cache",
   ...: "Connection": "keep-alive",
   ...: "DNT": "1",
   ...: "Host": "rapi.mobikwik.com",
   ...: "Origin": "https://www.mobikwik.com",
   ...: "Pragma": "no-cache",
   ...: "Referer": "https://www.mobikwik.com/",
   ...: "Sec-Fetch-Dest": "empty",
   ...: "Sec-Fetch-Mode": "cors",
   ...: "Sec-Fetch-Site": "same-site",
   ...: "Sec-GPC": "1",
   ...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
   ...: 9.169 Safari/537.36",
   ...: "X-MClient": "0"
   ...: }

In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)

In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn=9820123456> (referer: https://www.mobikwik.com/)

In [6]: json_data = response.json()

In [7]: for item in json_data['data']['plans']:
   ...:     print(item['id'])
   ...:
1104293
1155779
1155937
1164885
1156067

从 ajax 网站获取响应数据的 python 程序？

A python program that fetches response data from ajax website?

python

beautifulsoup

scrapy

web-scraping

python-requests