从 ajax 网站获取响应数据的 python 程序?
A python program that fetches response data from ajax website?
请注意,我是编程新手。这些是我在使用 python 学习网络抓取时遇到的问题。
我用的网站是https://www.mobikwik.com/(手机、dth、电费在线充值支付网站)
但我得到的只是抓取时的 403 响应。然后我明白了,这可能是因为网站使用了ajax。我的 objective 在制作程序时是接收用户输入的手机号码,然后在网站的移动运营商搜索中传递值,页面加载当前运营商和圈子,我想在我的程序中显示. python phonenumber 模块在手机号码被移植到其他运营商时没有用。任何帮助表示赞赏。谢谢。
有两个 xhr 请求,我不确定你想要哪个,所以我都做了。您只需重新创建请求即可。
getconnectiondetails
:
scrapy shell
In [1]: phone_number = '9820123456'
In [2]: url = 'https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn='
In [3]: headers = {
...: "Accept": "application/json, text/plain, */*",
...: "Accept-Encoding": "gzip, deflate, br",
...: "Accept-Language": "en-US,en;q=0.5",
...: "Cache-Control": "no-cache",
...: "Connection": "keep-alive",
...: "DNT": "1",
...: "Host": "rapi.mobikwik.com",
...: "Origin": "https://www.mobikwik.com",
...: "Pragma": "no-cache",
...: "Referer": "https://www.mobikwik.com/",
...: "Sec-Fetch-Dest": "empty",
...: "Sec-Fetch-Mode": "cors",
...: "Sec-Fetch-Site": "same-site",
...: "Sec-GPC": "1",
...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
...: 9.169 Safari/537.36",
...: "X-MClient": "0"
...: }
In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)
In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn=9820123456> (referer: https://www.mobikwik.com/)
In [6]: json_data = response.json()
In [7]: json_data['data']['operatorId']
Out[7]: 338
In [8]: json_data['data']['circleId']
Out[8]: 15
recommendedplans
:
scrapy shell
In [1]: phone_number = '9820123456'
In [2]: url = 'https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn='
In [3]: headers = {
...: "Accept": "application/json, text/plain, */*",
...: "Accept-Encoding": "gzip, deflate, br",
...: "Accept-Language": "en-US,en;q=0.5",
...: "Cache-Control": "no-cache",
...: "Connection": "keep-alive",
...: "DNT": "1",
...: "Host": "rapi.mobikwik.com",
...: "Origin": "https://www.mobikwik.com",
...: "Pragma": "no-cache",
...: "Referer": "https://www.mobikwik.com/",
...: "Sec-Fetch-Dest": "empty",
...: "Sec-Fetch-Mode": "cors",
...: "Sec-Fetch-Site": "same-site",
...: "Sec-GPC": "1",
...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
...: 9.169 Safari/537.36",
...: "X-MClient": "0"
...: }
In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)
In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn=9820123456> (referer: https://www.mobikwik.com/)
In [6]: json_data = response.json()
In [7]: for item in json_data['data']['plans']:
...: print(item['id'])
...:
1104293
1155779
1155937
1164885
1156067
请注意,我是编程新手。这些是我在使用 python 学习网络抓取时遇到的问题。 我用的网站是https://www.mobikwik.com/(手机、dth、电费在线充值支付网站) 但我得到的只是抓取时的 403 响应。然后我明白了,这可能是因为网站使用了ajax。我的 objective 在制作程序时是接收用户输入的手机号码,然后在网站的移动运营商搜索中传递值,页面加载当前运营商和圈子,我想在我的程序中显示. python phonenumber 模块在手机号码被移植到其他运营商时没有用。任何帮助表示赞赏。谢谢。
有两个 xhr 请求,我不确定你想要哪个,所以我都做了。您只需重新创建请求即可。
getconnectiondetails
:
scrapy shell
In [1]: phone_number = '9820123456'
In [2]: url = 'https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn='
In [3]: headers = {
...: "Accept": "application/json, text/plain, */*",
...: "Accept-Encoding": "gzip, deflate, br",
...: "Accept-Language": "en-US,en;q=0.5",
...: "Cache-Control": "no-cache",
...: "Connection": "keep-alive",
...: "DNT": "1",
...: "Host": "rapi.mobikwik.com",
...: "Origin": "https://www.mobikwik.com",
...: "Pragma": "no-cache",
...: "Referer": "https://www.mobikwik.com/",
...: "Sec-Fetch-Dest": "empty",
...: "Sec-Fetch-Mode": "cors",
...: "Sec-Fetch-Site": "same-site",
...: "Sec-GPC": "1",
...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
...: 9.169 Safari/537.36",
...: "X-MClient": "0"
...: }
In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)
In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/infobip/getconnectiondetails?cn=9820123456> (referer: https://www.mobikwik.com/)
In [6]: json_data = response.json()
In [7]: json_data['data']['operatorId']
Out[7]: 338
In [8]: json_data['data']['circleId']
Out[8]: 15
recommendedplans
:
scrapy shell
In [1]: phone_number = '9820123456'
In [2]: url = 'https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn='
In [3]: headers = {
...: "Accept": "application/json, text/plain, */*",
...: "Accept-Encoding": "gzip, deflate, br",
...: "Accept-Language": "en-US,en;q=0.5",
...: "Cache-Control": "no-cache",
...: "Connection": "keep-alive",
...: "DNT": "1",
...: "Host": "rapi.mobikwik.com",
...: "Origin": "https://www.mobikwik.com",
...: "Pragma": "no-cache",
...: "Referer": "https://www.mobikwik.com/",
...: "Sec-Fetch-Dest": "empty",
...: "Sec-Fetch-Mode": "cors",
...: "Sec-Fetch-Site": "same-site",
...: "Sec-GPC": "1",
...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
...: 9.169 Safari/537.36",
...: "X-MClient": "0"
...: }
In [4]: req = scrapy.Request(url=url+phone_number, headers=headers)
In [5]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://rapi.mobikwik.com/recharge/v1/rechargePlansAPI/recommendedplans/338/15?cn=9820123456> (referer: https://www.mobikwik.com/)
In [6]: json_data = response.json()
In [7]: for item in json_data['data']['plans']:
...: print(item['id'])
...:
1104293
1155779
1155937
1164885
1156067