class 的 R 网络抓取

Question

我有一项收集用于计算的 beta 值的财务任务，我是 R 的新手，我想通过包 rvest 或 httr 在网上抓取 beta 值。但是，输出是 character(0).

xpath:
//*[@id="StkList"]/ul/li[48]

library(rvest)
library(dplyr)

sym <- "1212"
url.3 < paste("http://www.etnet.com.hk/www/eng/stocks/realtime/quote.php?code=",sym,sep="")

beta.value <- url.3 %>% read_html() %>% html_nodes(xpath = "//*[@id='StkList']/ul/li[48]")

output:
character(0)

desired output:
0.270

我尝试不使用 xpath，但是 html_nodes("div.value.highlight") 但效果不佳。有没有人可以提供帮助或建议？谢谢。

Answer 1

他们在显示页面之前检查 referer，所以你必须添加一些 headers:

library(magrittr)
library(httr)
library(rvest)

httr::GET(
  url = "http://www.etnet.com.hk/www/eng/stocks/realtime/quote.php?code=1212", 
  httr::add_headers(
    Host = "www.etnet.com.hk",
    Referer = "http://www.etnet.com.hk/www/eng/stocks/realtime/quote.php?code=1212"
  )
) -> res

res <- content(res, encoding="UTF-8")

html_node(res, xpath=".//li[contains(., 'Beta')]/following-sibling::li[1]") %>% 
  html_text()
## [1] "+0.270"

class 的 R 网络抓取

R web scraping by class

finance

r

web-scraping

httr

rvest