无法使用请求从网页中获取表格内容
Can't fetch tabular content from a webpage using requests
我想从这个 website 的着陆页中抓取表格内容。它的第一页有 100 行。当我在开发工具中观察网络 activity 时,我注意到正在向此 url https://io6.dexscreener.io/u/ws3/screener3/
发出一些 get 请求,并使用适当的参数最终生成 json 内容。
但是,当我通过以下努力尝试模仿该请求时:
import requests
url = 'https://io6.dexscreener.io/u/ws3/screener3/'
params = {
'EIO': '4',
'transport': 'polling',
't': 'NwYSrFK',
'sid': 'ztAOHWOb-1ulTq-0AQwi',
}
headers = {
'accept': '*/*',
'referer': 'https://dexscreener.com/',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(url,params=params)
print(res.content)
我收到这样的回复:
`{"code":3,"message":"Bad request"}`
How can I get response having tabular content from that webpage?
这是一段非常快速但很脏的 python 代码,它执行初始握手并设置 websocket 连接并无限下载 json 格式的数据。我没有广泛测试这段代码,我不确定到底什么是必要的或不需要的(就握手的步骤而言)但我模仿了浏览器的行为并且它似乎工作正常:
import requests
from websocket import create_connection
import json
s = requests.Session()
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://dexscreener.com/ethereum'
resp = s.get(url,headers=headers)
print(resp)
step1 = s.get('https://io3.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Os')
step2 = s.get('https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-S5')
obj = json.loads(step2.text[1:])
code = obj['sid']
payload = '40/u/ws/screener/consolidated/platform/ethereum/h1/top/1,'
step3 = s.post(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xt&sid={code}',data=payload)
step4 = s.get(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xu&sid={code}')
d = step4.text.replace('','').replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,','').replace(payload,'')
start = '["screener",'
end = ']["latestBlock",'
dirty = d[d.find(start)+len(start):d.rfind(end)].strip()
clean = json.loads(dirty)
print(clean)
# Initialize the headers needed for the websocket connection
headers = json.dumps({
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-ZA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,de;q=0.6',
'Cache-Control':'no-cache',
'Connection':'Upgrade',
'Host':'io3.dexscreener.io',
'Origin':'https://dexscreener.com',
'Pragma':'no-cache',
'Sec-WebSocket-Extensions':'permessage-deflate; client_max_window_bits',
'Sec-WebSocket-Key':'ssklBDKxAOUt3D47SoEttQ==',
'Sec-WebSocket-Version':'13',
'Upgrade':'websocket',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
})
# Then create a connection to the tunnel
ws = create_connection(f"wss://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=websocket&sid={code}",headers=headers)
# Then send the initial messages through the tunnel
ws.send('2probe')
ws.send('5')
# Here you will view the message return from the tunnel
while True:
try:
json_data = json.loads(ws.recv().replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,',''))
print(json_data)
except:
pass
我想从这个 website 的着陆页中抓取表格内容。它的第一页有 100 行。当我在开发工具中观察网络 activity 时,我注意到正在向此 url https://io6.dexscreener.io/u/ws3/screener3/
发出一些 get 请求,并使用适当的参数最终生成 json 内容。
但是,当我通过以下努力尝试模仿该请求时:
import requests
url = 'https://io6.dexscreener.io/u/ws3/screener3/'
params = {
'EIO': '4',
'transport': 'polling',
't': 'NwYSrFK',
'sid': 'ztAOHWOb-1ulTq-0AQwi',
}
headers = {
'accept': '*/*',
'referer': 'https://dexscreener.com/',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(url,params=params)
print(res.content)
我收到这样的回复:
`{"code":3,"message":"Bad request"}`
How can I get response having tabular content from that webpage?
这是一段非常快速但很脏的 python 代码,它执行初始握手并设置 websocket 连接并无限下载 json 格式的数据。我没有广泛测试这段代码,我不确定到底什么是必要的或不需要的(就握手的步骤而言)但我模仿了浏览器的行为并且它似乎工作正常:
import requests
from websocket import create_connection
import json
s = requests.Session()
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://dexscreener.com/ethereum'
resp = s.get(url,headers=headers)
print(resp)
step1 = s.get('https://io3.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Os')
step2 = s.get('https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-S5')
obj = json.loads(step2.text[1:])
code = obj['sid']
payload = '40/u/ws/screener/consolidated/platform/ethereum/h1/top/1,'
step3 = s.post(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xt&sid={code}',data=payload)
step4 = s.get(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xu&sid={code}')
d = step4.text.replace('','').replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,','').replace(payload,'')
start = '["screener",'
end = ']["latestBlock",'
dirty = d[d.find(start)+len(start):d.rfind(end)].strip()
clean = json.loads(dirty)
print(clean)
# Initialize the headers needed for the websocket connection
headers = json.dumps({
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-ZA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,de;q=0.6',
'Cache-Control':'no-cache',
'Connection':'Upgrade',
'Host':'io3.dexscreener.io',
'Origin':'https://dexscreener.com',
'Pragma':'no-cache',
'Sec-WebSocket-Extensions':'permessage-deflate; client_max_window_bits',
'Sec-WebSocket-Key':'ssklBDKxAOUt3D47SoEttQ==',
'Sec-WebSocket-Version':'13',
'Upgrade':'websocket',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
})
# Then create a connection to the tunnel
ws = create_connection(f"wss://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=websocket&sid={code}",headers=headers)
# Then send the initial messages through the tunnel
ws.send('2probe')
ws.send('5')
# Here you will view the message return from the tunnel
while True:
try:
json_data = json.loads(ws.recv().replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,',''))
print(json_data)
except:
pass