Rmarkdown knit 中的 rvest scrape 失败,即使代码在没有 knit 的情况下也能工作
rvest scrape in Rmarkdown knit fails even though code works without knit
我运行代码,效果很好。我尝试编织,但出现此错误:
Error: Can't rename columns that don't exist.
The column Tests<U+2009>/millionpeople
doesn't exist.
我试过了,清除缓存,在开始时加载图像,为重命名和变异工作创建一个新对象,等等。可能会出现错误,因为在编织过程中没有加载(或找到)抓取的对象,但我不知道为什么或如何修复。
有什么想法吗?
谢谢!
我的代码:
library(utils) library(httr) library(tidyverse) library(rvest) library(ggpubr)
#scrapes from wikipedia, xpath is correct url <- "https://en.wikipedia.org/wiki/COVID-19_testing" tests <- url %>% read_html() %>% html_nodes(xpath='//*[@id="mw-content-text"]/div/table[4]') %>% html_table() %>% extract2(1) %>% # extracts data table from html list rename(country = "Country or region", tests = "Tests", positive
= "Positive", asof = "As of",
tests_per_million = "Tests /millionpeople" ,
positive_per_thousand_tests = "Positive /thousandtests", ref = "Ref.") %>% mutate(tests = as.numeric(gsub(",", "", tests)), positive = as.numeric(gsub(",", "", positive)),
tests_per_million = as.numeric(gsub(",", "", tests_per_million)),
positive_per_thousand_tests = round(positive_per_thousand_tests, 0)) #removes commas and coverts to numeric'
table 中有一些不同的名称带有特殊字符,可能会导致此问题。由于您要重命名所有列,请使用 rename_all
。
library(rvest)
library(dplyr)
library(readr)
url <- "https://en.wikipedia.org/wiki/COVID-19_testing"
tests <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="mw-content-text"]/div/table[4]') %>%
html_table() %>%
.[[1]] %>%
rename_all(~c("country", "tests", "positive", "asof",
"tests_per_million","positive_per_thousand_tests", "ref")) %>%
mutate(tests = parse_number(tests), positive = parse_number(positive),
tests_per_million = parse_number(tests_per_million),
positive_per_thousand_tests = round(positive_per_thousand_tests))
我运行代码,效果很好。我尝试编织,但出现此错误:
Error: Can't rename columns that don't exist. The column
Tests<U+2009>/millionpeople
doesn't exist.
我试过了,清除缓存,在开始时加载图像,为重命名和变异工作创建一个新对象,等等。可能会出现错误,因为在编织过程中没有加载(或找到)抓取的对象,但我不知道为什么或如何修复。
有什么想法吗? 谢谢!
我的代码:
library(utils) library(httr) library(tidyverse) library(rvest) library(ggpubr)
#scrapes from wikipedia, xpath is correct url <- "https://en.wikipedia.org/wiki/COVID-19_testing" tests <- url %>% read_html() %>% html_nodes(xpath='//*[@id="mw-content-text"]/div/table[4]') %>% html_table() %>% extract2(1) %>% # extracts data table from html list rename(country = "Country or region", tests = "Tests", positive
= "Positive", asof = "As of",
tests_per_million = "Tests /millionpeople" ,
positive_per_thousand_tests = "Positive /thousandtests", ref = "Ref.") %>% mutate(tests = as.numeric(gsub(",", "", tests)), positive = as.numeric(gsub(",", "", positive)),
tests_per_million = as.numeric(gsub(",", "", tests_per_million)),
positive_per_thousand_tests = round(positive_per_thousand_tests, 0)) #removes commas and coverts to numeric'
table 中有一些不同的名称带有特殊字符,可能会导致此问题。由于您要重命名所有列,请使用 rename_all
。
library(rvest)
library(dplyr)
library(readr)
url <- "https://en.wikipedia.org/wiki/COVID-19_testing"
tests <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="mw-content-text"]/div/table[4]') %>%
html_table() %>%
.[[1]] %>%
rename_all(~c("country", "tests", "positive", "asof",
"tests_per_million","positive_per_thousand_tests", "ref")) %>%
mutate(tests = parse_number(tests), positive = parse_number(positive),
tests_per_million = parse_number(tests_per_million),
positive_per_thousand_tests = round(positive_per_thousand_tests))