单击加载更多按钮使用 selenium 在 python3.7 中无法正常工作

Click on load more button using selenium not working properly in python3.7

在我抓取时,页面是动态的,带有 'load more' 按钮。 我为此使用了硒。 第一个问题是它只工作一次。意味着第一次点击加载更多按钮。 第二个问题是它只抓取第一个加载更多按钮之前的文章。之后就不刮了。 第三个问题是它对所有文章进行了两次抓取。 第四个问题是我只想要日期,但它连同日期、作者和地点一起给出。

import time
import requests
from bs4 import BeautifulSoup
from bs4.element import Tag
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
base = "https://indianexpress.com"
browser = webdriver.Safari(executable_path='/usr/bin/safaridriver')
wait = WebDriverWait(browser, 10)
browser.get('https://indianexpress.com/?s=cybersecurity')

while True:
    try:
        time.sleep(6)
        show_more = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, 'Load More')))
        show_more.click()
    except Exception as e:
            print(e)
            break

soup = BeautifulSoup(browser.page_source,'lxml')
search_results = soup.find('div', {'id':'ie-infinite-scroll'})

links = search_results.find_all('a')
for link in links:
    link_url = link['href']
    response = requests.get(link_url)
    sauce = BeautifulSoup(response.text, 'html.parser')
    dateTag = sauce.find('div', {'class':'m-story-meta__credit'})
    titleTag = sauce.find('h1', {'class':'m-story-header__title'})
    contentTag = ' '.join([item.get_text(strip=True) for item in sauce.select("[class^='o-story-content__main a-wysiwyg'] p")])

    date = None
    title = None
    content = None

    if isinstance(dateTag, Tag):
        date = dateTag.get_text().strip()
    if isinstance(titleTag, Tag):
        title = titleTag.get_text().strip()

    print(f'{date}\n {title}\n {contentTag}\n')
    time.sleep(3)

这段代码没有错误。但它需要改进。如何解决上述问题?

谢谢。

因为你等的不是新内容。在等待加载新内容时,您正在尝试单击 'load more' 按钮。

错误信息:

Message: Element <a class="m-featured-link m-featured-link--centered ie-load-more" href="#"> is not clickable at point (467,417) because another element <div class="o-listing__load-more m-loading"> obscures it

我的解决方案:

while True:
    try:
        wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@class, 'ie-load-more')]")))
        browser.find_element_by_xpath("//a[contains(@class, 'ie-load-more')]").click()
        wait.until(EC.visibility_of_element_located((By.XPATH,"//div[@class='o-listing__load-more']")))
    except Exception as e:
        print(e)
        break