Python 使用 BeautifulSoup & PIL 从 url 获取图像路径和大小
Python get image paths and sizes from a url using BeautifulSoup & PIL
我已经成功创建了一个 python 脚本,它可以打印来自指定 url 的所有图像路径:
from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
url="https://www.example.com/"
session = HTMLSession()
r = session.get(url)
b = requests.get(url)
soup = BeautifulSoup(b.text, "lxml")
images = soup.find_all('img')
for img in images:
if img.has_attr('src') :
print(img['src'])
我现在想做的是使用 PIL 在打印的 url 旁边打印图像尺寸。我试过了,但它出错了:
from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
from PIL import Image
import requests
url="https://www.example.com/"
session = HTMLSession()
r = session.get(url)
b = requests.get(url)
soup = BeautifulSoup(b.text, "lxml")
images = soup.find_all('img')
for img in images:
if img.has_attr('src') :
## Get image sizes in PIL
imgsize = Image.open(requests.get(img, stream=True).raw)
print(img['src'], imgsize.size)
有什么想法可以让它发挥作用吗?
您应该使用 img['src']
而不是 img
requests.get(img['src'], ...).raw
我已经成功创建了一个 python 脚本,它可以打印来自指定 url 的所有图像路径:
from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
url="https://www.example.com/"
session = HTMLSession()
r = session.get(url)
b = requests.get(url)
soup = BeautifulSoup(b.text, "lxml")
images = soup.find_all('img')
for img in images:
if img.has_attr('src') :
print(img['src'])
我现在想做的是使用 PIL 在打印的 url 旁边打印图像尺寸。我试过了,但它出错了:
from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
from PIL import Image
import requests
url="https://www.example.com/"
session = HTMLSession()
r = session.get(url)
b = requests.get(url)
soup = BeautifulSoup(b.text, "lxml")
images = soup.find_all('img')
for img in images:
if img.has_attr('src') :
## Get image sizes in PIL
imgsize = Image.open(requests.get(img, stream=True).raw)
print(img['src'], imgsize.size)
有什么想法可以让它发挥作用吗?
您应该使用 img['src']
而不是 img
requests.get(img['src'], ...).raw