来自 www.zoobashop.com 的 R 中的网络抓取图像

Question

我正在为我的分类算法收集蜡像。

起初我恢复了所有图像块的link。每个块包含 1 或 2 个图像，我想取回 links。

例如在这个块 link 上：https://www.zoobashop.com/woodin-fusion-de-woodin-wo29gha-29017-6-yards.html

library(rvest)
html <- read_html("https://www.zoobashop.com/woodin-fusion-de-woodin-wo29gha-29017-6-yards.html")

get_block_img <- function(html){
  html %>% 
    html_nodes('.fotorama__thumb  img#fotorama__img')%>% 
    html_attr("src")
}

get_block_img(html)

我收到结果 字符(0)

有人可以帮我吗

Answer 1

当 javascript 在浏览器中运行时，它是从脚本标记中动态检索的。您可以改用响应文本的正则表达式。

library(rvest)
library(stringr)

link <- str_match(read_html('https://www.zoobashop.com/woodin-fusion-de-woodin-wo29gha-29017-6-yards.html') %>%
        html_text(),'"data": .*?"img":"(.*?)"' )[1,2]

来自 www.zoobashop.com 的 R 中的网络抓取图像

Web Scraping image in R from www.zoobashop.com

screen-scraping

r

image

web-scraping

rvest