通过 python 图像爬虫获取图像 src 并将图像保存在目录中
getting imgae src and save images in a directory by python image crawler
我想创建一个 python 图片抓取工具。
这是我现在拥有的:
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = 'http://blog.pouyacode.net/'
data = urlopen(url)
soup = BeautifulSoup(data, 'html.parser')
img = soup.findAll('img')
print (img)
print ('\n')
print ('****************************')
print ('\n')
for each in img:
print(img.get('src'))
print ('\n')
这部分有效:
print (img)
print ('\n')
print ('****************************')
print ('\n')
但是在输出*****************
之后,出现了这个错误:
Traceback (most recent call last):
File "pull.py", line 15, in <module>
print(img.get('src'))
AttributeError: 'ResultSet' object has no attribute 'get'
那么如何获取所有图片的所有SRC呢?
以及如何将这些图像保存在目录中?
是这样的吗?凭头脑写的,未经测试
from bs4 import BeautifulSoup
from urllib.request import urlopen
import os
url = 'http://blog.pouyacode.net/'
download_folder = "downloads"
if not os.path.exists(download_folder):
os.makedirs(download_folder)
data = urlopen(url)
soup = BeautifulSoup(data, 'html.parser')
img = soup.findAll('img')
for each in img:
url = each.get('src')
data = urlopen(url)
with open(os.path.join(download_folder, os.path.basename(url)), "wb") as f:
f.write(data.read())
我想创建一个 python 图片抓取工具。
这是我现在拥有的:
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = 'http://blog.pouyacode.net/'
data = urlopen(url)
soup = BeautifulSoup(data, 'html.parser')
img = soup.findAll('img')
print (img)
print ('\n')
print ('****************************')
print ('\n')
for each in img:
print(img.get('src'))
print ('\n')
这部分有效:
print (img)
print ('\n')
print ('****************************')
print ('\n')
但是在输出*****************
之后,出现了这个错误:
Traceback (most recent call last):
File "pull.py", line 15, in <module>
print(img.get('src'))
AttributeError: 'ResultSet' object has no attribute 'get'
那么如何获取所有图片的所有SRC呢? 以及如何将这些图像保存在目录中?
是这样的吗?凭头脑写的,未经测试
from bs4 import BeautifulSoup
from urllib.request import urlopen
import os
url = 'http://blog.pouyacode.net/'
download_folder = "downloads"
if not os.path.exists(download_folder):
os.makedirs(download_folder)
data = urlopen(url)
soup = BeautifulSoup(data, 'html.parser')
img = soup.findAll('img')
for each in img:
url = each.get('src')
data = urlopen(url)
with open(os.path.join(download_folder, os.path.basename(url)), "wb") as f:
f.write(data.read())