尽管 purrr 的 'otherwise' 仍然出错 - 为什么 purrr/possibly 的 'otherwise' 没有被触发?

Error despite purrr's 'otherwise' - Why is purrr/possibly's 'otherwise' not triggered?

我正在从网站上抓取内容。为此,我遍历链接。如果发生错误,purrrpossibly 副词应该使过程继续进行,并作为结果放置一个“缺失”(或“NA_character”)。

当链接到的站点不存在时,下面的代码可以正常工作,即输出“缺失”; 但是,如果链接到的站点存在,但我试图从该站点提取的元素不存在,尽管为 'otherwise'.

定义了一个值,该函数仍会抛出错误

对我来说这很令人惊讶,因为文档指出

' possibly : wrapped function uses a default value ( otherwise ) whenever an error occurs.'

知道为什么会这样吗?我知道我可以相应地修改函数(例如检查返回对象的长度)。但我不明白为什么没有使用“otherwise”值。

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
library(rvest)
#> Warning: package 'rvest' was built under R version 4.0.4
#> 
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#> 
#>     guess_encoding

# possibly with wrong links when scraping site ----------------------------
#see https://github.com/tidyverse/purrr/issues/409

sample_data <- tibble::tibble(
  link = c(
    #link ok, selected item exists
    "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll",
    #link not ok
    "https://www.wrong-url.foobar",
    #link ok, selected item does not exist on site
    "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
    
           )
)


fn_get_link_to_records <- function(link_to_overview_sessions) {
  
print(link_to_overview_sessions)
    
link_to_overview_sessions %>% 
    rvest::read_html() %>% 
    rvest::html_elements("a") %>% 
    rvest::html_attr("href") %>% 
    enframe(name = NULL,
            value = "link_to_text") %>% 
    filter(str_detect(link_to_text, regex("\/NRSITZ_\d+\/fnameorig_\d+\.html$"))) %>% 
    mutate(link_to_text=glue::glue("https://www.parlament.gv.at/{link_to_text}")) %>% 
    pull()
}


sample_data %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise=NA_character_)))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = NA_character_))`.

sample_data %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.

reprex package (v1.0.0)

于 2021-03-28 创建

更新:我在下面添加了输出以使意外结果(最后一个块)更清楚。

sample_data[1:2,] %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> # A tibble: 2 x 2
#>   link                                  link_to_text                            
#>   <chr>                                 <chr>                                   
#> 1 https://www.parlament.gv.at/PAKT/VHG~ https://www.parlament.gv.at//PAKT/VHG/X~
#> 2 https://www.wrong-url.foobar          missing
sample_data[3, ] %>% 
  mutate(link_to_text=map_chr(link, 
                              possibly(fn_get_link_to_records,
                                       otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 1 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.

reprex package (v1.0.0)

于 2021-03-29 创建

错误来自 map_chr,但您有 possibly 围绕 fn_get_link_to_records 函数。如果你 运行 fn_get_link_to_records(sample_data$link[3]) 你会看到 URL 得到了打印,没有任何东西被 returned 并且没有错误产生。但是,map_chr 无法将此空输出更改为字符值,因此会出现错误。如果您使用 map 而不是 map_chr,您会发现它有效。

sample_data[3,] %>% 
  mutate(link_to_text= map(link, fn_get_link_to_records))

#[1] #"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
# A tibble: 1 x 2
#  link                                                                                     link_to_text
#  <chr>                                                                                    <list>      
#1 https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Pro… <glue [0]> 

但是 link_to_text 是空的。您已经知道的解决方案是检查输出值的长度和 return NA 或在 fn_get_link_to_records 函数内部生成错误,这些情况将使用 possibly.[= 处理22=]