尝试在 RSelenium 中下载缓存图片
Trying to download a cached picture in RSelenium
我正在使用 RSelenium 从在线存储库下载一系列报纸文章。到目前为止,我这样做的方式是使用 remDr$screenshot()
功能,但由于分辨率、缩放和取景的原因,我想知道是否可以只下载显示的图片。访问页面的示例代码如下:
library(RSelenium)
rD1 <- rsDriver(browser = "firefox",port=4567L)
remDr <- rD1[["client"]]
url1<-"http://memoria.bn.br/DocReader/DocReader.aspx?"
url2<-"bib=090972_07&pesq=cangaceiro&pasta=ano%20192"
remDr$navigate(paste0(url1,url2))
通过查看页面的源代码,我注意到图像托管在缓存 url cache/2286106490137/I0000051-20Alt=000869Lar=000615LargOri=005060AltOri=007149.JPG
中(ID 为 DocumentoImg
)。有没有办法直接从这个地址下载,不依赖截图?
是的,你可以像这样直接在R中下载图片:
# I have split the url just to make it legible on screen here
url_pt1 <- "http://memoria.bn.br/DocReader/cache/2627304510157"
url_pt2 <- "/I0000051-20Alt=001984Lar=001404LargOri=005060AltOri=007149.JPG"
big_url <- paste0(url_pt1, url_pt2)
# Choose local file location to download file
file_to <- "download.jpg"
download.file(big_url, file_to)
#> trying URL 'http://memoria.bn.br/DocReader/cache/2627304510157
#> /I0000051-20Alt=001984Lar=001404LargOri=005060AltOri=007149.JPG'
#> Content type 'text/html; charset=utf-8' length 8457 bytes
#> downloaded 8457 bytes
我正在使用 RSelenium 从在线存储库下载一系列报纸文章。到目前为止,我这样做的方式是使用 remDr$screenshot()
功能,但由于分辨率、缩放和取景的原因,我想知道是否可以只下载显示的图片。访问页面的示例代码如下:
library(RSelenium)
rD1 <- rsDriver(browser = "firefox",port=4567L)
remDr <- rD1[["client"]]
url1<-"http://memoria.bn.br/DocReader/DocReader.aspx?"
url2<-"bib=090972_07&pesq=cangaceiro&pasta=ano%20192"
remDr$navigate(paste0(url1,url2))
通过查看页面的源代码,我注意到图像托管在缓存 url cache/2286106490137/I0000051-20Alt=000869Lar=000615LargOri=005060AltOri=007149.JPG
中(ID 为 DocumentoImg
)。有没有办法直接从这个地址下载,不依赖截图?
是的,你可以像这样直接在R中下载图片:
# I have split the url just to make it legible on screen here
url_pt1 <- "http://memoria.bn.br/DocReader/cache/2627304510157"
url_pt2 <- "/I0000051-20Alt=001984Lar=001404LargOri=005060AltOri=007149.JPG"
big_url <- paste0(url_pt1, url_pt2)
# Choose local file location to download file
file_to <- "download.jpg"
download.file(big_url, file_to)
#> trying URL 'http://memoria.bn.br/DocReader/cache/2627304510157
#> /I0000051-20Alt=001984Lar=001404LargOri=005060AltOri=007149.JPG'
#> Content type 'text/html; charset=utf-8' length 8457 bytes
#> downloaded 8457 bytes