创建循环遍历页码的函数
Creating a function that loops through page numbers
我有一个导入数据的脚本,如下所示:
library(tidyverse)
library(rvest)
library(magrittr)
page_number <- 1:20
base_url <- read_html("https://247sports.com/Season/2021-Football/CompositeRecruitRankings/?ViewPath=~%2FViews%2FSkyNet%2FPlayerSportRanking%2F_SimpleSetForSeason.ascx&Page=1")
rankings <- base_url %>% html_nodes(".meta , .score , .position , .rankings-page__name-link") %>%
html_text() %>%
str_trim %>%
str_split(" ") %>%
unlist %>%
matrix(ncol = 4, byrow = T) %>%
as.data.frame
您会注意到 base_url
,在最后,它包括 &Page=1
。好吧,我正在努力做到 20 页,因此:
page_number <- 1:20
无需编写 20 组不同的代码即可将这些数字循环到 URL 中的最有效方法是什么?
您可以使用 paste0
或 sprintf
构造所有 URL 的
all_urls <- paste0("https://247sports.com/Season/2021-Football/CompositeRecruitRankings/?ViewPath=~%2FViews%2FSkyNet%2FPlayerSportRanking%2F_SimpleSetForSeason.ascx&Page=", 1:20)
然后您可以遍历每个 URL 并提取所需的数据。
library(tidyverse)
library(rvest)
rankings <- map(all_urls, ~.x %>% read_html %>%
html_nodes(".meta , .score , .position , .rankings-page__name-link") %>%
html_text() %>%
str_trim %>%
str_split(" ") %>%
unlist %>%
matrix(ncol = 4, byrow = T) %>%
as.data.frame)
我有一个导入数据的脚本,如下所示:
library(tidyverse)
library(rvest)
library(magrittr)
page_number <- 1:20
base_url <- read_html("https://247sports.com/Season/2021-Football/CompositeRecruitRankings/?ViewPath=~%2FViews%2FSkyNet%2FPlayerSportRanking%2F_SimpleSetForSeason.ascx&Page=1")
rankings <- base_url %>% html_nodes(".meta , .score , .position , .rankings-page__name-link") %>%
html_text() %>%
str_trim %>%
str_split(" ") %>%
unlist %>%
matrix(ncol = 4, byrow = T) %>%
as.data.frame
您会注意到 base_url
,在最后,它包括 &Page=1
。好吧,我正在努力做到 20 页,因此:
page_number <- 1:20
无需编写 20 组不同的代码即可将这些数字循环到 URL 中的最有效方法是什么?
您可以使用 paste0
或 sprintf
构造所有 URL 的
all_urls <- paste0("https://247sports.com/Season/2021-Football/CompositeRecruitRankings/?ViewPath=~%2FViews%2FSkyNet%2FPlayerSportRanking%2F_SimpleSetForSeason.ascx&Page=", 1:20)
然后您可以遍历每个 URL 并提取所需的数据。
library(tidyverse)
library(rvest)
rankings <- map(all_urls, ~.x %>% read_html %>%
html_nodes(".meta , .score , .position , .rankings-page__name-link") %>%
html_text() %>%
str_trim %>%
str_split(" ") %>%
unlist %>%
matrix(ncol = 4, byrow = T) %>%
as.data.frame)