将嵌套 for 循环转换为函数
Convert nested for loop to a function
我写了一个嵌套的 for 循环,想把它转换成一个函数。这是我目前拥有的代码:
## Sleeper function
testit <- function(x)
{
p1 <- proc.time()
Sys.sleep(x)
proc.time() - p1 # The cpu usage should be negligible
}
## Set up null objects to fill in loop
episodes = NULL
l = NULL
## Scrape pages and append together
for (season in c(27:38)){
for (episode in c(1:13)){
tryCatch({
if (season == 34 & episode == 3){
l = 'l'
}
new_episode <- read_html(paste0('http://www.chakoteya.net/DoctorWho/', season, '-', episode, '.htm', l)) %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
episodes <- episodes %>% bind_rows(new_episode)
cat(paste('\rseason = ', season, '; episode = ', episode))
testit(2)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
}
如您所见,它在第 27-38 季和每个季的第 1-13 集中循环播放。有时一季只有 12 集,因此 tryCatch()
。而且,由于我不明白的原因,有时 URL 需要一个 .html
后缀(如果季节 >= 34 和剧集 >= 3),有时它需要一个 .htm
后缀,因此if (season == 34 & episode == 3)
语句。
我想将它转换成一个函数,可能使用 apply()
或 map()
,但我的函数技能仍然很初级,我正在努力。
作为最终输出,调用一个名为 doctor_who()
的函数会很棒:
episodes <- doctor_who(season = c(27:38), episode = c(1:13))
是这样的吗?
library(dplyr)
library(xml2)
testit <- function(x) {
p1 <- proc.time()
Sys.sleep(x)
proc.time() - p1 # The cpu usage should be negligible
}
doctor_who <- function(season = 27:38, episode = 1:13){
## Set up null objects to fill in loop
episodes = NULL
l = NULL
res <- vector("list", length = length(season)* length(episodes))
inx <- 0L
## Scrape pages and append together
for (s in season){
for (e in episode){
tryCatch({
if (s == 34 && e == 3){
l = 'l'
}
URL <- paste0('http://www.chakoteya.net/DoctorWho/', s, '-', e, '.htm', l)
new_episode <- read_html(URL, encoding = "shift_jis") %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
episodes <- episodes %>% bind_rows(new_episode)
message(paste('season = ', s, '; episode = ', e))
testit(2)
inx <- inx + 1L
res[[inx]] <- episodes
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
}
res
}
episodes <- doctor_who(season = 27:38, episode = 1:13)
# how many were read
length(episodes)
#[1] 137
# how many were expected
length(27:38)*length(1:13)
#[1] 156
我会把它分成两个函数-
第一个函数从一个文件中读取数据,第二个函数创建每个季节和剧集的所有 url 并使用 map_df
一个一个地传递它们。
在第一个函数中,我们首先尝试从 .htm
读取数据,如果失败则从 .html
.
读取数据
read_text <- function(url) {
val <- tryCatch({
read_html(url) %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
}, error = function(e) return(NA))
if(NROW(val) == 1) {
url <- paste0(url, 'l')
val <- tryCatch({
read_html(url) %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
}, error = function(e) return(NA))
}
if(NROW(val) > 1) val else NULL
}
doctor_who <- function(season, episode) {
all_urls <- sprintf('http://www.chakoteya.net/DoctorWho/%s.htm',
c(t(outer(season, episode, paste, sep = '-'))))
purrr::map_df(all_urls, read_text, .id = 'id')
}
res <- doctor_who(season = 1:5, episode = 1:5)
我添加了一个 id
列来区分每一集。
# id value
# <chr> <chr>
# 1 1 " \r\nOriginal Airdate: 23 Nov, 1963"
# 2 1 " [Coal Hill School corridor] "
# 3 1 " (The bell is ringing for end of classes.)\r\nGIRL: Night, Miss Wright. \…
# 4 1 " [Laboratory] "
# 5 1 " (A man is tidying up after the class) \r\nIAN: Oh? Not gone yet? \r\nBAR…
# 6 1 " [Classroom] "
# 7 1 " (Susan is listening to guitar rock music on her\r\ntransistor radio. I'm…
# 8 1 " [Totter's Lane] "
# 9 1 " (Ian and Barbara are parked up.) \r\nBARBARA: Over there. \r\nIAN: We're…
#10 1 " [Memory - classroom] "
# … with 4,446 more rows
我写了一个嵌套的 for 循环,想把它转换成一个函数。这是我目前拥有的代码:
## Sleeper function
testit <- function(x)
{
p1 <- proc.time()
Sys.sleep(x)
proc.time() - p1 # The cpu usage should be negligible
}
## Set up null objects to fill in loop
episodes = NULL
l = NULL
## Scrape pages and append together
for (season in c(27:38)){
for (episode in c(1:13)){
tryCatch({
if (season == 34 & episode == 3){
l = 'l'
}
new_episode <- read_html(paste0('http://www.chakoteya.net/DoctorWho/', season, '-', episode, '.htm', l)) %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
episodes <- episodes %>% bind_rows(new_episode)
cat(paste('\rseason = ', season, '; episode = ', episode))
testit(2)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
}
如您所见,它在第 27-38 季和每个季的第 1-13 集中循环播放。有时一季只有 12 集,因此 tryCatch()
。而且,由于我不明白的原因,有时 URL 需要一个 .html
后缀(如果季节 >= 34 和剧集 >= 3),有时它需要一个 .htm
后缀,因此if (season == 34 & episode == 3)
语句。
我想将它转换成一个函数,可能使用 apply()
或 map()
,但我的函数技能仍然很初级,我正在努力。
作为最终输出,调用一个名为 doctor_who()
的函数会很棒:
episodes <- doctor_who(season = c(27:38), episode = c(1:13))
是这样的吗?
library(dplyr)
library(xml2)
testit <- function(x) {
p1 <- proc.time()
Sys.sleep(x)
proc.time() - p1 # The cpu usage should be negligible
}
doctor_who <- function(season = 27:38, episode = 1:13){
## Set up null objects to fill in loop
episodes = NULL
l = NULL
res <- vector("list", length = length(season)* length(episodes))
inx <- 0L
## Scrape pages and append together
for (s in season){
for (e in episode){
tryCatch({
if (s == 34 && e == 3){
l = 'l'
}
URL <- paste0('http://www.chakoteya.net/DoctorWho/', s, '-', e, '.htm', l)
new_episode <- read_html(URL, encoding = "shift_jis") %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
episodes <- episodes %>% bind_rows(new_episode)
message(paste('season = ', s, '; episode = ', e))
testit(2)
inx <- inx + 1L
res[[inx]] <- episodes
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
}
res
}
episodes <- doctor_who(season = 27:38, episode = 1:13)
# how many were read
length(episodes)
#[1] 137
# how many were expected
length(27:38)*length(1:13)
#[1] 156
我会把它分成两个函数-
第一个函数从一个文件中读取数据,第二个函数创建每个季节和剧集的所有 url 并使用 map_df
一个一个地传递它们。
在第一个函数中,我们首先尝试从 .htm
读取数据,如果失败则从 .html
.
read_text <- function(url) {
val <- tryCatch({
read_html(url) %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
}, error = function(e) return(NA))
if(NROW(val) == 1) {
url <- paste0(url, 'l')
val <- tryCatch({
read_html(url) %>%
html_nodes("p") %>%
html_text() %>%
tibble(value = .)
}, error = function(e) return(NA))
}
if(NROW(val) > 1) val else NULL
}
doctor_who <- function(season, episode) {
all_urls <- sprintf('http://www.chakoteya.net/DoctorWho/%s.htm',
c(t(outer(season, episode, paste, sep = '-'))))
purrr::map_df(all_urls, read_text, .id = 'id')
}
res <- doctor_who(season = 1:5, episode = 1:5)
我添加了一个 id
列来区分每一集。
# id value
# <chr> <chr>
# 1 1 " \r\nOriginal Airdate: 23 Nov, 1963"
# 2 1 " [Coal Hill School corridor] "
# 3 1 " (The bell is ringing for end of classes.)\r\nGIRL: Night, Miss Wright. \…
# 4 1 " [Laboratory] "
# 5 1 " (A man is tidying up after the class) \r\nIAN: Oh? Not gone yet? \r\nBAR…
# 6 1 " [Classroom] "
# 7 1 " (Susan is listening to guitar rock music on her\r\ntransistor radio. I'm…
# 8 1 " [Totter's Lane] "
# 9 1 " (Ian and Barbara are parked up.) \r\nBARBARA: Over there. \r\nIAN: We're…
#10 1 " [Memory - classroom] "
# … with 4,446 more rows