无法使用请求从网页中获取表格内容

Can't fetch tabular content from a webpage using requests

我想从这个 website 的着陆页中抓取表格内容。它的第一页有 100 行。当我在开发工具中观察网络 activity 时,我注意到正在向此 url https://io6.dexscreener.io/u/ws3/screener3/ 发出一些 get 请求,并使用适当的参数最终生成 json 内容。

但是,当我通过以下努力尝试模仿该请求时:

import requests

url = 'https://io6.dexscreener.io/u/ws3/screener3/'
params = {
    'EIO': '4',
    'transport': 'polling',
    't': 'NwYSrFK',
    'sid': 'ztAOHWOb-1ulTq-0AQwi',
}

headers = {
    'accept': '*/*',
    'referer': 'https://dexscreener.com/',
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
}
with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(url,params=params)
    print(res.content)

我收到这样的回复:

`{"code":3,"message":"Bad request"}`

How can I get response having tabular content from that webpage?

这是一段非常快速但很脏的 python 代码,它执行初始握手并设置 websocket 连接并无限下载 json 格式的数据。我没有广泛测试这段代码,我不确定到底什么是必要的或不需要的(就握手的步骤而言)但我模仿了浏览器的行为并且它似乎工作正常:

import requests
from websocket import create_connection
import json

s = requests.Session()

headers =   {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://dexscreener.com/ethereum'

resp = s.get(url,headers=headers)
print(resp)

step1 = s.get('https://io3.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Os')
step2 = s.get('https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-S5')

obj = json.loads(step2.text[1:])
code = obj['sid']

payload = '40/u/ws/screener/consolidated/platform/ethereum/h1/top/1,'

step3 = s.post(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xt&sid={code}',data=payload)
step4 = s.get(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xu&sid={code}')
d = step4.text.replace('','').replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,','').replace(payload,'')

start = '["screener",'
end = ']["latestBlock",'

dirty = d[d.find(start)+len(start):d.rfind(end)].strip()
clean = json.loads(dirty)
print(clean)

# Initialize the headers needed for the websocket connection
headers = json.dumps({
    'Accept-Encoding':'gzip, deflate, br',
    'Accept-Language':'en-ZA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,de;q=0.6',
    'Cache-Control':'no-cache',
    'Connection':'Upgrade',
    'Host':'io3.dexscreener.io',
    'Origin':'https://dexscreener.com',
    'Pragma':'no-cache',
    'Sec-WebSocket-Extensions':'permessage-deflate; client_max_window_bits',
    'Sec-WebSocket-Key':'ssklBDKxAOUt3D47SoEttQ==',
    'Sec-WebSocket-Version':'13',
    'Upgrade':'websocket',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
    })

# Then create a connection to the tunnel
ws = create_connection(f"wss://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=websocket&sid={code}",headers=headers)

# Then send the initial messages through the tunnel
ws.send('2probe')
ws.send('5')

# Here you will view the message return from the tunnel
while True:
    try:
        json_data = json.loads(ws.recv().replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,',''))
        print(json_data)
    except:
        pass