单击按钮 RSelenium Amazon Page Turn

Click Button RSelenium Amazon Page Turn

我无法让 Rselenium 在我试图抓取的亚马逊评论部分翻页。下面是我的代码。我几乎尝试了 CSS 和 xpath 的所有可能组合。有什么想法吗?

       replicate(100,
          {
remDr$navigate("https://www.amazon.com/Eagles-Nest-Outfitters-DoubleNest-Portable/product-reviews/B00K30GXK8/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviewshttps://www.amazon.com/Eagles-Nest-Outfitters-DoubleNest-Portable/product-reviews/B00K30GXK8/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews")
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))
morereviews <- remDr$findElement(using = 'css selector', ".a-last a")
morereviews$clickElement()
Sys.sleep(4)

reviews <- xml2::read_html(remDr$getPageSource()[[1]])%>%
  rvest::html_nodes(".review-text")%>%
  dplyr::data_frame(reviews = .)
})

在这种情况下,您不需要使用 RSelenium,只需使用 rvest。首先,您可以抓取其中一个页面的评论,直接阅读 html。其次,请注意,每次您在评论部分翻页时,url 也会发生变化(实际上,它表示您看到的页码)。因此,您可以使用一个循环来更改 url 并抓取所有评论:

reviews <- lapply(1:100,
       function(i){
         url <- paste0("https://www.amazon.com/Eagles-Nest-Outfitters-DoubleNest-Portable/product-reviews/B00K30GXK8/ref=cm_cr_getr_d_paging_btm_next_",i,"?ie=UTF8&reviewerType=all_reviewshttps%3A%2F%2Fwww.amazon.com%2FEagles-Nest-Outfitters-DoubleNest-Portable%2Fproduct-reviews%2FB00K30GXK8%2Fref%3Dcm_cr_dp_d_show_all_btm%3Fie%3DUTF8&reviewerType=all_reviews&pageNumber=",i)
         xml2::read_html(url) %>%
           rvest::html_nodes(".review-text") %>%
           rvest::html_text() %>%
           dplyr::data_frame(reviews = .)
       })
(reviews <- do.call("rbind", reviews))