Rvest 只抓取了一部分 table

Question

我是 Rvest 的新手。我想从这个网站上抓取有关加密货币的信息： https://coinmarketcap.com/.

我能够抓取 table 中列出的前 10 种货币的所有信息，但对于其他货币，我只能获得名称和价格。什么原因？我怎样才能抓取关于所有货币的所有信息？

我的代码：

library(rvest)
market <- as.data.frame(read_html('https://coinmarketcap.com/')  %>%
  html_table(fill = TRUE))

Answer 1

网页加载 'dynamically'，而不是一次加载，因此您需要使用 RSelenium 而不是 rvest。

以下是否有效？

url<- "https://coinmarketcap.com/"

# RSelenium with Firefox
rD <- RSelenium::rsDriver(browser="firefox", port=4546L, verbose=F)
remDr <- rD[["client"]]
remDr$navigate(url)
Sys.sleep(4)

# get the page source
web <- remDr$getPageSource()
web <- xml2::read_html(web[[1]])

table <- html_table(web, fill = TRUE) %>%
  as.data.frame()

# close RSelenium
remDr$close()
gc()
rD$server$stop()
system("taskkill /im java.exe /f", intern=FALSE, ignore.stdout=FALSE)

顺便说一下，该网页似乎有一个 API。您可能会通过 API.

更有效地获取相同的数据

Rvest 只抓取了一部分 table

Rvest only scrapes part of the table

r

html-table

rvest