Rmarkdown knit 中的 rvest scrape 失败，即使代码在没有 knit 的情况下也能工作

Question

我运行代码，效果很好。我尝试编织，但出现此错误：

Error: Can't rename columns that don't exist. The column Tests<U+2009>/millionpeople doesn't exist.

我试过了，清除缓存，在开始时加载图像，为重命名和变异工作创建一个新对象，等等。可能会出现错误，因为在编织过程中没有加载（或找到）抓取的对象，但我不知道为什么或如何修复。

有什么想法吗？谢谢！

我的代码：

library(utils) library(httr) library(tidyverse) library(rvest) library(ggpubr)

#scrapes from wikipedia, xpath is correct url <- "https://en.wikipedia.org/wiki/COVID-19_testing"  tests <- url %>%     read_html() %>%   html_nodes(xpath='//*[@id="mw-content-text"]/div/table[4]') %>%    html_table() %>%    extract2(1) %>% # extracts data table from html list   rename(country = "Country or region", tests = "Tests", positive
= "Positive", asof = "As of", 
         tests_per_million = "Tests /millionpeople" ,
         positive_per_thousand_tests = "Positive /thousandtests", ref = "Ref.") %>%   mutate(tests = as.numeric(gsub(",", "", tests)), positive = as.numeric(gsub(",", "", positive)),
         tests_per_million = as.numeric(gsub(",", "", tests_per_million)),
         positive_per_thousand_tests = round(positive_per_thousand_tests, 0)) #removes commas and coverts to numeric'

Answer 1

table 中有一些不同的名称带有特殊字符，可能会导致此问题。由于您要重命名所有列，请使用 rename_all。

library(rvest)
library(dplyr)
library(readr)

url <- "https://en.wikipedia.org/wiki/COVID-19_testing"

tests <-  url %>%     
  read_html() %>%
  html_nodes(xpath='//*[@id="mw-content-text"]/div/table[4]') %>%
  html_table() %>%
  .[[1]] %>%
  rename_all(~c("country", "tests", "positive", "asof", 
                "tests_per_million","positive_per_thousand_tests", "ref")) %>%   
   mutate(tests = parse_number(tests), positive = parse_number(positive),
          tests_per_million = parse_number(tests_per_million),
          positive_per_thousand_tests = round(positive_per_thousand_tests))

Rmarkdown knit 中的 rvest scrape 失败，即使代码在没有 knit 的情况下也能工作

rvest scrape in Rmarkdown knit fails even though code works without knit

r

knitr

r-markdown

rvest