创建循环遍历页码的函数

Creating a function that loops through page numbers

我有一个导入数据的脚本,如下所示:

library(tidyverse)
library(rvest)
library(magrittr)

page_number <- 1:20

base_url <- read_html("https://247sports.com/Season/2021-Football/CompositeRecruitRankings/?ViewPath=~%2FViews%2FSkyNet%2FPlayerSportRanking%2F_SimpleSetForSeason.ascx&Page=1")

rankings <- base_url %>% html_nodes(".meta , .score , .position , .rankings-page__name-link") %>%
  html_text() %>% 
  str_trim %>% 
  str_split("   ") %>% 
  unlist %>%
  matrix(ncol = 4, byrow = T) %>% 
  as.data.frame

您会注意到 base_url,在最后,它包括 &Page=1。好吧,我正在努力做到 20 页,因此:

page_number <- 1:20

无需编写 20 组不同的代码即可将这些数字循环到 URL 中的最有效方法是什么?

您可以使用 paste0sprintf 构造所有 URL 的

all_urls <- paste0("https://247sports.com/Season/2021-Football/CompositeRecruitRankings/?ViewPath=~%2FViews%2FSkyNet%2FPlayerSportRanking%2F_SimpleSetForSeason.ascx&Page=", 1:20)

然后您可以遍历每个 URL 并提取所需的数据。

library(tidyverse)
library(rvest)

rankings <- map(all_urls, ~.x %>% read_html %>%
            html_nodes(".meta , .score , .position , .rankings-page__name-link") %>%
            html_text() %>% 
            str_trim %>% 
            str_split("   ") %>% 
            unlist %>%
            matrix(ncol = 4, byrow = T) %>% 
            as.data.frame)