无法使用 Selenium 从网页中定位文本

Question

我正在尝试抓取亚马逊对某个产品的评论，但我无法使用 selenium 找到评级文本。但是同样的事情很容易 被刮掉 使用 soup.

Link 转至页面： https://www.amazon.in/BenQ-inch-Bezel-Monitor-Built/product-reviews/B073NTCT4R/ref=cm_cr_arp_d_paging_btm_next_2?ie=UTF8&reviewerType=all_reviews&pageNumber=39

这是我使用 Soup 的代码：

 link='same link as mentioned above'
 url=requests.get(link).content
 bs=soup(url,'html.parser')
 for i in bs.find_all('span',{'class':'a-icon-alt'}):
    print(i.text.split(' ')[0])

##输出 4.3 5.0 1.0 5.0 2.0 4.0 1.0 5.0 5.0 5.0 5.0 5.0 5.0

这是我使用 Selenium 的代码：

import time
from selenium import webdriver
from bs4 import BeautifulSoup as soup
import requests

link='link to the above mentioned page'
driver=webdriver.Chrome()
driver.get(link)
for i in driver.find_elements_by_css_selector('.a-icon-alt'):
     print(i.text)

我无法使用 Selenium 获得相同的结果，我得到的只是空白，相当于该页面上存在的项目数。我也尝试过使用 XPath 和 class_name 但没有得到所需的响应。

Answer 1

要获得评论评分诱导 WebDriverWait 并等待 presence_of_all_elements_located() 并使用 get_attribute("innerHTML") 而不是文本

代码:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

link='https://www.amazon.in/BenQ-inch-Bezel-Monitor-Built/product-reviews/B073NTCT4R/ref=cm_cr_arp_d_paging_btm_next_2?ie=UTF8&reviewerType=all_reviews&pageNumber=39'
driver=webdriver.Chrome()
driver.get(link)
elements=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".a-icon-alt")))
for i in elements:
    print(i.get_attribute("innerHTML").split(' ')[0])

控制台输出：

4.3
5.0
1.0
5.0
2.0
4.0
1.0
5.0
5.0
5.0
5.0
5.0
5.0

无法使用 Selenium 从网页中定位文本

Unable to locate text from web page using Selenium

css

python

selenium

webdriverwait