简单的网页抓取问题 - rvest
Simple webscraping issue- rvest
我正在尝试通过网络抓取“https://stats.premierlacrosseleague.com/pll-team-table”上的数据 table。我已经尝试了多种不同的方法来实现这一点,但结果总是一样,我的 table 是空的?有没有人有任何解决方案?我在下面发布了我的代码,在此先感谢!
library(rvest)
pll <- read_html("https://stats.premierlacrosseleague.com/pll-team-table")
table<- pll%>%html_nodes(".jss820")%>%html_text()
data_table<- data.frame(table)
不幸的是,以这种方式抓取是行不通的,因为数据是动态加载的;页面加载后。如果您右键单击页面,单击 'inspect element',转到 'network' 选项卡,然后刷新页面,您可以看到正在发出的 XHR 请求。
其中一个请求是 https://api.stats.premierlacrosseleague.com/v1.00/teams-stats/all/2020, which contains the table you want, in JSON form. The below code reads that table with jsonlite (which gives a nested list
in R) and turns it into a data.frame
using unnest_wider:
library(tidyverse)
library(jsonlite)
url <- "https://api.stats.premierlacrosseleague.com/v1.00/teams-stats/all/2020"
data_list <- jsonlite::read_json(url)
data_table <- tibble(data = data_list) %>%
unnest_wider(data)
这给了
# A tibble: 7 x 55
scores faceoffPct shotPct twoPointShotPct twoPointShotsOn… clearPct ridesPct savePct shortHandedPct
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 20 0.488 0.339 0.5 3.83 0.9 0 0.644 0
2 21 0.490 0.230 0.6 1.93 0.961 0.12 0.588 0
3 16 0.452 0.238 0.5 1.75 0.98 0.0769 0.623 0
4 25 0.667 0.293 0.545 2.73 0.932 0.0196 0.591 0
5 28 0.333 0.184 0.6 1.52 0.940 0.0263 0.559 0
6 17 0.523 0.239 0.8 4.2 0.935 0.0755 0.545 0
7 13 0.696 0.351 0.571 2.43 1 0.0870 0.682 0
# … with 46 more variables
我正在尝试通过网络抓取“https://stats.premierlacrosseleague.com/pll-team-table”上的数据 table。我已经尝试了多种不同的方法来实现这一点,但结果总是一样,我的 table 是空的?有没有人有任何解决方案?我在下面发布了我的代码,在此先感谢!
library(rvest)
pll <- read_html("https://stats.premierlacrosseleague.com/pll-team-table")
table<- pll%>%html_nodes(".jss820")%>%html_text()
data_table<- data.frame(table)
不幸的是,以这种方式抓取是行不通的,因为数据是动态加载的;页面加载后。如果您右键单击页面,单击 'inspect element',转到 'network' 选项卡,然后刷新页面,您可以看到正在发出的 XHR 请求。
其中一个请求是 https://api.stats.premierlacrosseleague.com/v1.00/teams-stats/all/2020, which contains the table you want, in JSON form. The below code reads that table with jsonlite (which gives a nested list
in R) and turns it into a data.frame
using unnest_wider:
library(tidyverse)
library(jsonlite)
url <- "https://api.stats.premierlacrosseleague.com/v1.00/teams-stats/all/2020"
data_list <- jsonlite::read_json(url)
data_table <- tibble(data = data_list) %>%
unnest_wider(data)
这给了
# A tibble: 7 x 55
scores faceoffPct shotPct twoPointShotPct twoPointShotsOn… clearPct ridesPct savePct shortHandedPct
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 20 0.488 0.339 0.5 3.83 0.9 0 0.644 0
2 21 0.490 0.230 0.6 1.93 0.961 0.12 0.588 0
3 16 0.452 0.238 0.5 1.75 0.98 0.0769 0.623 0
4 25 0.667 0.293 0.545 2.73 0.932 0.0196 0.591 0
5 28 0.333 0.184 0.6 1.52 0.940 0.0263 0.559 0
6 17 0.523 0.239 0.8 4.2 0.935 0.0755 0.545 0
7 13 0.696 0.351 0.571 2.43 1 0.0870 0.682 0
# … with 46 more variables