使用 Soup 抓取图像
Scrape an image using Soup
我正在尝试从该网站抓取图像:https://www.remax.ca/on/richmond-hill-real-estate/-2407--9201-yonge-st-wp_id268950754-lst。当前代码为:
url = 'https://www.remax.ca/on/richmond-hill-real-estate/-2407--9201-yonge-st-wp_id268950754-lst'
soup = BeautifulSoup(urlopen(url), 'html.parser')
imgs = soup.findAll('div', attrs = {'class': 'images is-flex flex-one has-flex-align-center has-flex-content-center'})
当我查看 imgs
的内部时,我找不到 image active ng-star-inserted ng-lazyloaded
和 srcset
。结果,我无法下载图像。
有人可以建议如何解决这个问题吗?
可以使用xpath查找图片,使用requests获取图片然后写入文件如下
import requests
from lxml import html
# send request to website
r = requests.get("thewebsite")
# convert to html object
tree = html.fromstring(r.content)
# find images urls from xpath
image_urls = tree.xpath("xpaths/@href")
# write each image to your computer
for i in image_urls:
with open("filename","wb") as f:
f.write(i)
图像延迟加载,我认为问题在于此。所以我抓取了加载和管理这些图片的脚本。
script = soup.find('script', {'type': 'application/ld+json'})
script_json = json.loads(script.contents[0])
imgs = script_json['@graph'][1]['photo']['url']
现在 imgs
包含您为该住宅提供的 link 所有 11 张图像的列表。
我正在尝试从该网站抓取图像:https://www.remax.ca/on/richmond-hill-real-estate/-2407--9201-yonge-st-wp_id268950754-lst。当前代码为:
url = 'https://www.remax.ca/on/richmond-hill-real-estate/-2407--9201-yonge-st-wp_id268950754-lst'
soup = BeautifulSoup(urlopen(url), 'html.parser')
imgs = soup.findAll('div', attrs = {'class': 'images is-flex flex-one has-flex-align-center has-flex-content-center'})
当我查看 imgs
的内部时,我找不到 image active ng-star-inserted ng-lazyloaded
和 srcset
。结果,我无法下载图像。
有人可以建议如何解决这个问题吗?
可以使用xpath查找图片,使用requests获取图片然后写入文件如下
import requests
from lxml import html
# send request to website
r = requests.get("thewebsite")
# convert to html object
tree = html.fromstring(r.content)
# find images urls from xpath
image_urls = tree.xpath("xpaths/@href")
# write each image to your computer
for i in image_urls:
with open("filename","wb") as f:
f.write(i)
图像延迟加载,我认为问题在于此。所以我抓取了加载和管理这些图片的脚本。
script = soup.find('script', {'type': 'application/ld+json'})
script_json = json.loads(script.contents[0])
imgs = script_json['@graph'][1]['photo']['url']
现在 imgs
包含您为该住宅提供的 link 所有 11 张图像的列表。