使用 python 抓取点击时出现的 table

Question

我想从中抓取信息 page。

具体来说，我想抓取当您单击 "TOP 10 HOLDINGS" 下的 "View all" 时出现的 table（您必须稍微向下滚动页面）。

我是网络抓取的新手，并尝试使用 BeautifulSoup 来做到这一点。但是，似乎有一个问题，因为我需要考虑 "onclick" 函数。换句话说：我直接从页面抓取的 HTML 代码不包括我想要获取的 table。

我对我的下一步有点困惑：我应该使用像 selenium 这样的东西还是我可以以 easier/more 有效的方式处理这个问题？

谢谢。

我当前的代码：

from bs4 import BeautifulSoup
import requests


Soup = BeautifulSoup
my_url = 'http://www.etf.com/SHE'
page = requests.get(my_url)
htmltxt = page.text

soup = Soup(htmltxt, "html.parser")
print(soup)

Answer 1

您可以从 api 获得 json 响应：http://www.etf.com/view_all/holdings/SHE。您要查找的 table 位于 'view_all'。

import requests
from bs4 import BeautifulSoup as Soup

url = 'http://www.etf.com/SHE'
api = "http://www.etf.com/view_all/holdings/SHE"
headers = {'X-Requested-With':'XMLHttpRequest', 'Referer':url}
page = requests.get(api, headers=headers)
htmltxt = page.json()['view_all']
soup = Soup(htmltxt, "html.parser")
data = [[td.text for td in tr.find_all('td')] for tr in soup.find_all('tr')]

print('\n'.join(': '.join(row) for row in data))

使用 python 抓取点击时出现的 table

Scraping a table appearing on click with python

html

python

selenium

beautifulsoup

scrape