如何从需要在 r 中进行交互的网页中抓取文本
How scrape text from webpage that requires interaction in r
我正在尝试从网页上抓取评论以确定词频。但是,当评论较长时,只给出部分评论。您必须单击 "More" 才能让网页显示完整的评论。这是我用来提取评论文本的代码。我怎样才能 "click" on more 获得完整的评论?
library(rvest)
tripAdvisorURL <- "https://www.tripadvisor.com/Hotel_Review-g33657-d85704-
Reviews-Hotel_Bristol-Steamboat_Springs_Colorado.html#REVIEWS"
webpage <-read_html(tripAdvisorURL)
reviewData <- xml_nodes(webpage,xpath = '//*[contains(concat( " ", @class, "
" ), concat( " ", "partial_entry", " " ))]')
head(reviewData)
xml_text(reviewData[[1]])
[1] "The rooms were clean and we slept so good we had room 10 and 12 we
didn’t use 12 but it joins 10 .kind of strange but loved the hotel ..me
personally I would take the hot tub out it was kinda old..the lady
that...More"
如评论中所述,您可以将 Rselenium 与 rvest 一起使用以获得更多交互性:
library(RSelenium)
rmDr <- rsDriver(browser = "chrome")
myclient <- rmDr$client
tripAdvisorURL <- "https://www.tripadvisor.com/Hotel_Review-g33657-d85704-Reviews-Hotel_Bristol-Steamboat_Springs_Colorado.html#REVIEWS"
myclient$navigate(tripAdvisorURL)
#select all "more" button, and loop to click them
webEles <- myclient$findElements(using = "css",value = ".ulBlueLinks")
for (webEle in webEles) {
webEle$clickElement()
}
mypagesource <- myclient$getPageSource()
read_html(mypagesource[[1]]) %>%
html_nodes(".partial_entry") %>%
html_text()
我正在尝试从网页上抓取评论以确定词频。但是,当评论较长时,只给出部分评论。您必须单击 "More" 才能让网页显示完整的评论。这是我用来提取评论文本的代码。我怎样才能 "click" on more 获得完整的评论?
library(rvest)
tripAdvisorURL <- "https://www.tripadvisor.com/Hotel_Review-g33657-d85704-
Reviews-Hotel_Bristol-Steamboat_Springs_Colorado.html#REVIEWS"
webpage <-read_html(tripAdvisorURL)
reviewData <- xml_nodes(webpage,xpath = '//*[contains(concat( " ", @class, "
" ), concat( " ", "partial_entry", " " ))]')
head(reviewData)
xml_text(reviewData[[1]])
[1] "The rooms were clean and we slept so good we had room 10 and 12 we
didn’t use 12 but it joins 10 .kind of strange but loved the hotel ..me
personally I would take the hot tub out it was kinda old..the lady
that...More"
如评论中所述,您可以将 Rselenium 与 rvest 一起使用以获得更多交互性:
library(RSelenium)
rmDr <- rsDriver(browser = "chrome")
myclient <- rmDr$client
tripAdvisorURL <- "https://www.tripadvisor.com/Hotel_Review-g33657-d85704-Reviews-Hotel_Bristol-Steamboat_Springs_Colorado.html#REVIEWS"
myclient$navigate(tripAdvisorURL)
#select all "more" button, and loop to click them
webEles <- myclient$findElements(using = "css",value = ".ulBlueLinks")
for (webEle in webEles) {
webEle$clickElement()
}
mypagesource <- myclient$getPageSource()
read_html(mypagesource[[1]]) %>%
html_nodes(".partial_entry") %>%
html_text()