在没有 RSelenium 的情况下在 R 中抓取帧？

Question

我需要抓取右侧框架中可见的“稿件接收日期”，一旦您单击此页面的“信息”：https://onlinelibrary.wiley.com/doi/10.1002/jcc.26717 . I tried to use an rvest script listed below, that worked fine in similar situations. However, it does not work in this case, perhaps because of the click required to get to the publication history. I tried solving this issue by adding #pane-pcw-details to the url (https://onlinelibrary.wiley.com/doi/10.1002/jcc.26717#pane-pcw-details）但无济于事。另一种选择是使用 RSelenium，但也许有更简单的解决方法？

library(rvest)

link <-c("https://onlinelibrary.wiley.com/doi/10.1002/jcc.26717#pane-pcw-details")
wiley_output <-data.frame()

page = read_html(link)
revhist = page %>% html_node(".publication-history li:nth-child(5)") %>% html_text()
wiley_output = rbind(wiley_output, data.frame(link, revhist, stringsAsFactors = FALSE))

Answer 1

该数据来自您可以在网络选项卡中找到的 ajax 呼叫。它有很多查询字符串参数，但实际上您只需要 文档的标识符 ，以及 ajax = True 以确保 return 与 指定的 ajax 操作关联的数据:

https://onlinelibrary.wiley.com/action/ajaxShowPubInfo?ajax=true&doi=10.1002/jcc.26717

library(rvest)
library(magrittr)

link <- 'https://onlinelibrary.wiley.com/action/ajaxShowPubInfo?ajax=true&doi=10.1002/jcc.26717'  
page <- read_html(link)   
page %>% html_node(".publication-history li:nth-child(5)") %>% html_text()

在没有 RSelenium 的情况下在 R 中抓取帧？

Scraping frames in R without RSelenium?

r

web-scraping

rvest