rvest 返回空列表

Question

我正在尝试通过复制 html 代码的 xpath 并使用 rvest 包来从网站导入 table。我之前已经多次成功地做到了这一点，但是当我现在尝试它时，我只是生成了一个空列表。为了诊断我的问题，我运行以下代码（摘自 https://www.r-bloggers.com/using-rvest-to-scrape-an-html-table/）。但是，这段代码也为我生成了一个空列表。

在此先感谢您的帮助！

library(rvest)
url <- "http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population"
population <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="mw-content-text"]/table[1]') %>%
  html_table()

Answer 1

您的 xpath 查询有误。 table 不是 id 为 mw-content-text 的节点的直接子节点。虽然是后代。尝试

html_nodes(xpath='//*[@id="mw-content-text"]//table[1]')

网络抓取是一项非常脆弱的工作，当网站更改其 HTML 时很容易崩溃。

rvest 返回空列表

rvest returning empty list

html

r

web-scraping

rvest