r trycatch 用于执行网络爬虫的永远循环 Q

Question

我最近在学习网络爬虫。

我使用 While(TRUE) 让我的网页抓取功能永远运行。

但是，我的网络不稳定，所以当我的网络出现故障时有时会显示错误。

我试着用 trycatch 函数弄明白了

rf()是我的网页爬取函数

if(tryCatch(rf(), error=function(cond) FALSE)==FALSE){
  Sys.sleep(60+sample(1:5,1))
  rf()
}

我让 R 过一段时间再运行它，因为互联网通常过一段时间就可以了。

然而，这段代码还不够好，因为如果互联网第二次失败，R 就会停止。

我想如果我使用 repeat 可能会更好，如下所示。

我说得对吗？

repeat {
if(tryCatch(rf(), error=function(cond) FALSE)==FALSE){
  Sys.sleep(60+sample(1:5,1))
  rf()
} 
Sys.sleep(900+sample(1:30,1))#second
}

Answer 1

在while(TRUE)

中可能很简单try

喜欢

e <- simpleError("test error")
while(T){
  try(stop(e))
  Sys.sleep(2)
  print(1)

}

但试试你的功能

Answer 2

如果请求失败后你想等待更长的时间，试试这个代码：

while(TRUE) {
  tryCatch({
    rf()
    print("wait normal")
    Sys.sleep(60+sample(1:5,1))
  }, error = function(e) { 
    print("wait longer") 
    Sys.sleep(900+sample(1:30,1))
    }
  )
}

请注意，您的 rf() 函数必须位于 try 语句中，以便捕获错误而不会进一步引发错误。

r trycatch 用于执行网络爬虫的永远循环 Q

r trycatch for conduct a forever loop for web crawling Q

r

web-crawler