使用 python 抓取延迟加载页面的所有条目

Scraping all entries of lazyloading page using python

通过 ECB press releases 查看此页面。这些可以追溯到 1997 年,所以如果能自动获取所有链接及时返回就好了。

我找到了包含链接的标签 ('//*[@id="lazyload-container"]'),但它只获取最新的链接。

如何获得其余的?

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'/usr/local/bin/geckodriver') 
driver.get(url)
element = driver.find_element_by_xpath('//*[@id="lazyload-container"]')
element = element.get_attribute('innerHTML')

数据是通过 JavaScript 从另一个 URL 加载的。您可以使用此示例如何加载不同年份的版本:

import requests
from bs4 import BeautifulSoup

url = "https://www.ecb.europa.eu/press/pr/date/{}/html/index_include.en.html"

for year in range(1997, 2023):
    soup = BeautifulSoup(requests.get(url.format(year)).content, "html.parser")
    for a in soup.select(".title a")[::-1]:
        print(a.find_previous(class_="date").text, a.text)

打印:

25 April 1997 "EUR" - the new currency code for the euro
1 July 1997 Change of presidency of the European Monetary Institute
2 July 1997 The security features of the euro banknotes
2 July 1997 The EMI's mandate with respect to banknotes

...

17 February 2022 Financial statements of the ECB for 2021
21 February 2022 Survey on credit terms and conditions in euro-denominated securities financing and over-the-counter derivatives markets (SESFOD) - December 2021
21 February 2022 Results of the December 2021 survey on credit terms and conditions in euro-denominated securities financing and over-the-counter derivatives markets (SESFOD)

编辑:打印链接:


import requests
from bs4 import BeautifulSoup

url = "https://www.ecb.europa.eu/press/pr/date/{}/html/index_include.en.html"

for year in range(1997, 2023):
    soup = BeautifulSoup(requests.get(url.format(year)).content, "html.parser")
    for a in soup.select(".title a")[::-1]:
        print(
            a.find_previous(class_="date").text,
            a.text,
            "https://www.ecb.europa.eu" + a["href"],
        )

打印:

...

15 December 1999 Monetary policy decisions https://www.ecb.europa.eu/press/pr/date/1999/html/pr991215.en.html
20 December 1999 Visit by the Finnish Prime Minister https://www.ecb.europa.eu/press/pr/date/1999/html/pr991220.en.html

...