使用 RSelenium 进行网页抓取：findElement 不返回任何内容

Question

我是网络抓取的新手，并且一直在尝试使用 RSelenium（作为 rvest 的替代品）收集信息，因为一些我感兴趣的网站使用 JavaScript。但是，当我使用下面的代码时，findElement() 函数 returns 什么都没有。

library(RSelenium)

driver <- rsDriver(browser=c("chrome"), chromever="81.0.4044.138")

remote_driver <- driver$client

remote_driver$navigate("https://www.gucci.com/uk/en_gb/ca/decor-c-decor")

p <- remote_driver$findElement(using = "xpath", "//span[@class = 'sale']")
product <- p$getElementText()
product

xpath 似乎是正确的，有什么想法吗？

Answer 1

我不确定这是否是最好的方法，但您可以使用 RSelenium 获取页面源（包括 Javascript 元素），然后使用 rvest 提取这些元素。

library(dplyr)
library(rvest)

elemrvest <- remote_driver$getPageSource()[[1]]

df <- tibble(Products = elemrvest %>% 
               read_html() %>% 
               html_nodes(xpath = "//div[@class = 'product-tiles-grid-item-info']/h2") %>% 
               html_text(),
             Prices = elemrvest %>% 
               read_html() %>% 
               html_nodes(xpath = "//span[@class = 'sale']") %>% 
               html_text())

使用 RSelenium 进行网页抓取：findElement 不返回任何内容

Web scraping with RSelenium: findElement returning nothing

javascript

r

web-scraping

rselenium