使用 rvest 获取冠状病毒确诊病例数

Question

我正在尝试创建一个 scraper 来获取有关冠状病毒病例数的更新，我正在使用下面的页面。

当我将 xpath 提供给已确认病例数时，它显示“0”，而不是当我检查 Google Chrome 的开发工具时显示在页面上的真实数字。有谁知道这里出了什么问题？

library(rvest)

response = read_html('https://news.qq.com/zt2020/page/feiyan.htm', encoding = 'GBK')

response %>%
  html_node(xpath = '//*[@id="charts"]/div[3]/div[1]')

{html_node}
<div class="icbar confirm">
[1] <div class="number">0</div>
[2] <div class="text">全国确诊</div>

Answer 1

如@Marius 所述，您可以从加载的 JSON 文件中获取数据。我从开发人员工具中获得了 url，将 html 读取为文本，并获取了后跟 "confirm" 的数字。

url <- 'https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5&callback=jQuery34108850961227842673_1580448523488&_=1580448523489'

library(rvest)
url %>% 
  read_html() %>%
  html_text() %>%
  stringr::str_match('confirm.*?(\d+)') %>% .[,2] %>% as.integer()

#[1] 9731

使用 rvest 获取冠状病毒确诊病例数

Using rvest to get the number of confirmed cases of coronavirus

r

rvest