直接从网站上使用 R 阅读 HTML table

Question

我想直接从政府网站读取 covid 数据：https://pikobar.jabarprov.go.id/distribution-case#

我使用 rvest 库

url <- "https://pikobar.jabarprov.go.id/distribution-case#"
df <- url %>% 
  read_html() %>% 
  html_nodes("table") %>% 
  html_table(fill = T)

我看到有人使用 lapply 使它变得整洁 table，但当我尝试时它看起来一团糟，因为我是新手。有谁能够帮助我？我真的很沮丧

Answer 1

您无法通过 rvest 抓取 table 中的数据，因为它被请求到此 link： https://dashboard-pikobar-api.digitalservice.id/v2/sebaran/pertumbuhan?wilayah=kota&=32 附有 api-key。

pg <- httr::GET(
  "https://dashboard-pikobar-api.digitalservice.id/v2/sebaran/pertumbuhan?wilayah=kota&=32",
  config = httr::add_headers(`api-key` = "480d0aeb78bd0064d45ef6b2254be9b3")
)
data <- httr::content(pg)$data

我不知道 api-key 将来是否有效，但据我所知目前有效。

直接从网站上使用 R 阅读 HTML table

Read HTML table using R directly from a website

r

web-scraping

rvest