在 R 中从 Dataframe 抓取网页
Webscraping in R From Dataframe
来自以下数据框
我正在尝试使用包 rvest 从网站 https://www.thesaurus.com/browse/research?s=t 中抓取每个词的词性和同义词到 csv 中。
我不确定如何让 R 搜索数据框的每个单词并提取其词性和同义词。
install.packages("rvest")
install.packages("xml2")
library(xml2)
library(rvest)
library(dplyr)
words<data.frame("keywords"=c("research","survey","staff","outpatient","consent"))
html<- read_html("https://www.merriam-webster.com/thesaurus/research")
html %>% html_nodes(".mw-list") %>% html_text () %>%
head(n=1) # take the first 1st records
如果您在同义词库中搜索 [您的术语],您最终会看到以下 HTML 页面:“https://www.thesaurus.com/browse/[your 术语]”。如果你知道这一点,你可以获得你感兴趣的所有页面的 HTMLs。之后你应该能够从 purrr
中使用 map()
函数进行迭代打包以获取您想要的信息:
# It makes more sense to just keep "words" as a vector for now
words <- c("research","survey","staff","outpatient","consent")
htmls <- paste0("https://www.thesaurus.com/browse/", words)
info_list <- map(htmls, .x %>%
read_html() %>%
html_node(.mw-list) %>%
html_text())
来自以下数据框
我正在尝试使用包 rvest 从网站 https://www.thesaurus.com/browse/research?s=t 中抓取每个词的词性和同义词到 csv 中。
我不确定如何让 R 搜索数据框的每个单词并提取其词性和同义词。
install.packages("rvest")
install.packages("xml2")
library(xml2)
library(rvest)
library(dplyr)
words<data.frame("keywords"=c("research","survey","staff","outpatient","consent"))
html<- read_html("https://www.merriam-webster.com/thesaurus/research")
html %>% html_nodes(".mw-list") %>% html_text () %>%
head(n=1) # take the first 1st records
如果您在同义词库中搜索 [您的术语],您最终会看到以下 HTML 页面:“https://www.thesaurus.com/browse/[your 术语]”。如果你知道这一点,你可以获得你感兴趣的所有页面的 HTMLs。之后你应该能够从 purrr
中使用 map()
函数进行迭代打包以获取您想要的信息:
# It makes more sense to just keep "words" as a vector for now
words <- c("research","survey","staff","outpatient","consent")
htmls <- paste0("https://www.thesaurus.com/browse/", words)
info_list <- map(htmls, .x %>%
read_html() %>%
html_node(.mw-list) %>%
html_text())