Python

Question

我可以通过以下代码将静态网站抓取到 csv：

import pandas as pd
url = 'http://www.etnet.com.hk/www/tc/futures/index.php?subtype=HSI&month=201801&tab=interval'
for i, df in enumerate(pd.read_html(url)):
    filename = 'C:/Users/Lawrence/Desktop/PyTest/output%02d.csv' % i
    df.to_csv(filename, encoding='UTF-8')

但是，我发现它不适用于动态网站。我怎样才能做到这一点？

P.S.: 我正在使用 Python 3.6

Answer 1

您可以使用 selenium 的 webdriver，它可以像处理常规 Web 浏览器一样处理网站。在您的示例中，在不更改代码的情况下应用硒的最简单方法如下：

import pandas as pd
from selenium import webdriver

url = 'http://www.etnet.com.hk/www/tc/futures/index.php?subtype=HSI&month=201801&tab=interval'

# The following lines are so the browser is headless, i.e. it doesn't open a window
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1200x600')

wd = webdriver.Chrome(chrome_options=options)  # Open a browser using the options set

wd.get(url)  # Open the desired url in the browser
for i, df in enumerate(pd.read_html(wd.page_source)):  # Use wd.page_source to feed pd.read_html
    filename = 'C:/Users/Lawrence/Desktop/PyTest/output%02d.csv' % i
    df.to_csv(filename, encoding='UTF-8')

wd.close()  # Close the browser

Python - 以 table(s) 格式将动态网站抓取到 csv

Python - scrape dynamic website to csv in neat table(s) format

export-to-csv