div class 抓取
div class scraping
我正在尝试使用以下代码从以下网站抓取 table:
library(rvest)
library(tidyverse)
library(dplyr)
base<-'******************'
links<-read_html(base)%>%html_nodes(".v-data-table__wrapper")
但还没有运气。谁能帮我解决这个问题?
页面源码中原来没有table。本页面使用JS生成table:
思路是运行JS代码获取数据(需要V8
包):
library(V8)
library(rvest)
js <- read_html("https://www.locate.ai/retail-tracker.html") %>%
html_node(xpath = "//script[contains(., 'gon.data')]") %>% html_text()
ct <- V8::new_context()
ct$eval("var window = {}, gon = {};") # need to initialize variables first
ct$eval(js)
data <- ct$get("gon")
# mining the data
cities <- data$regions
retailbrands <- data$brands
结果:
> head(cities)
region change
1 Minneapolis, MN -0.7164120
2 Boston, MA -0.6337319
3 Washington, DC -0.6191386
4 Detroit, MI -0.5693641
5 Chicago, IL -0.5101856
6 Charlotte, NC -0.4810490
> head(retailbrands)
brand change
1 LA Fitness -0.6168534
2 Wells Fargo -0.5355715
3 Foot Locker -0.5211365
4 Ethan Allen -0.5096331
5 Clean Juice -0.5079978
6 Texas Roadhouse -0.4770344
我正在尝试使用以下代码从以下网站抓取 table:
library(rvest)
library(tidyverse)
library(dplyr)
base<-'******************'
links<-read_html(base)%>%html_nodes(".v-data-table__wrapper")
但还没有运气。谁能帮我解决这个问题?
页面源码中原来没有table。本页面使用JS生成table:
思路是运行JS代码获取数据(需要V8
包):
library(V8)
library(rvest)
js <- read_html("https://www.locate.ai/retail-tracker.html") %>%
html_node(xpath = "//script[contains(., 'gon.data')]") %>% html_text()
ct <- V8::new_context()
ct$eval("var window = {}, gon = {};") # need to initialize variables first
ct$eval(js)
data <- ct$get("gon")
# mining the data
cities <- data$regions
retailbrands <- data$brands
结果:
> head(cities)
region change
1 Minneapolis, MN -0.7164120
2 Boston, MA -0.6337319
3 Washington, DC -0.6191386
4 Detroit, MI -0.5693641
5 Chicago, IL -0.5101856
6 Charlotte, NC -0.4810490
> head(retailbrands)
brand change
1 LA Fitness -0.6168534
2 Wells Fargo -0.5355715
3 Foot Locker -0.5211365
4 Ethan Allen -0.5096331
5 Clean Juice -0.5079978
6 Texas Roadhouse -0.4770344