purrr 安全地运行一个函数并保存给出错误的链接

Question

我有一些 links:

myLinks = c("https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l", 
"https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l", 
"https://www.fotocasa.es/es/comprar/viviendas/vilafranca-del-penedes/todas-las-zonas/l", 
"https://www.fotocasa.es/es/comprar/viviendas/vilobi-del-penedes/todas-las-zonas/l"
)

其中一个 link 返回错误，但我不想删除它，我只想存储 link 以便我可以进一步检查它。

Data/Code:

library(RSelenium)
library(rvest)
library(tidyverse)

rD <- rsDriver(browser="firefox", port=4536L)
remDr <- rD[["client"]]

collectZonaLinkData <- function(zona_url_to_get){
  
  remDr$navigate(zona_url_to_get)
  #click on Distrito
  remDr$findElement(using = "xpath", '/html/body/div[1]/div[2]/div[1]/div[3]/div/div[1]/div')$clickElement()
  html_zona_full_page = remDr$getPageSource()[[1]] %>% 
    read_html()
  
  Zonas_Names = html_zona_full_page %>% 
    html_nodes('.re-GeographicSearchNext-checkboxItem.is-checked') %>% # only interested in the checked name boxes
    html_nodes('.re-GeographicSearchNext-checkboxItem-literal') %>% 
    html_text()
  
  Zonas_Link  = html_zona_full_page %>% 
    html_nodes('.re-GeographicSearchNext-checkboxItem.is-checked') %>% 
    html_attr('href') %>% 
    paste("https://www.fotocasa.es", ., sep = "")
  
  zonas = cbind.data.frame(Zonas_Names, Zonas_Link)
  return(zonas)
}

我可以运行以下内容：

out = map(myLinks, ~ collectZonaLinkData(.x)) %>% 
  set_names(myLinks) %>% 
  bind_rows(.id = "ID")

出现以下错误：

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1

问题url如下：

collectZonaLinkData(myLinks[3])

如何将 collectZonaLinkData 包裹在 safely 中并确保 Zonas_Link 在数据框中包含 NA。

即运行宁以下：

myLinks = myLinks[1:2]
out = map(myLinks, ~ collectZonaLinkData(.x)) %>% 
  set_names(myLinks) %>% 
  bind_rows(.id = "ID")

给我 2 links 的输出，它有效：

                                                                                ID       Zonas_Names
1        https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l        Torrelavit
2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l Torrelles de Foix
                                                                        Zonas_Link
1        https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l
2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l

第三个 link 不起作用，所以它可以收集 ID 但是对于 Zonas_Names 和 Zonas_Link 我想要一个 NA列。

我不确定是否应该将 safely() 函数包裹在 collectZonaLinkData 中的 Zonas_Names 和 Zonas_Links 周围？

预期输出：

                                                                                            ID       Zonas_Names
            1        https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l        Torrelavit
            2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l Torrelles de Foix
            3 https://www.fotocasa.es/es/comprar/viviendas/vilafranca-del-penedes/todas-las-zonas/l
NA
                                                                                    Zonas_Link
            1        https://www.fotocasa.es/es/comprar/viviendas/torrelavit/todas-las-zonas/l
            2 https://www.fotocasa.es/es/comprar/viviendas/torrelles-de-foix/todas-las-zonas/l
            3 NA

编辑：

输出：

# A tibble: 1 × 4
  `https://www.fotocasa.… `https://www.fotocasa.es… $Zonas_Link         `https://www.fotocasa.e… `https://www.fotocasa.e… $Zonas_Link       
  <lgl>                   <fct>                     <fct>               <lgl>                    <fct>                    <fct>             
1 NA                      Torrelles de Foix         https://www.fotoca… NA                       Vilobí del Penedès       https://www.fotoc…

Answer 1

我们可以将函数包装为 possibly 或 safely

的输入

pcollectZonaLinkData <- possibly(collectZonaLinkData, 
   otherwise = tibble(ID = NA_character_, 
    Zonas_Names = NA_character_, Zonas_link = NA_character_))

然后在map

中使用这个函数

library(purrr)
library(dplyr)
out <- map(myLinks, ~ pcollectZonaLinkData(.x)) %>% 
  set_names(myLinks) %>% 
  bind_rows(.id = "ID")

purrr 安全地运行一个函数并保存给出错误的链接

purrr safely over a function and save the links which are giving errors

r

purrr