Python error: PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x1144e9860>

Python error: PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x1144e9860>

我有以下脚本打印指定 url 上所有图像的 src 路径和大小:

from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
from PIL import Image
import requests

url="https://example.com/"

session = HTMLSession()
r = session.get(url)

b  = requests.get(url)
soup = BeautifulSoup(b.text, "lxml")

images = soup.find_all('img')

for img in images:
    if img.has_attr('src') :
        imgsize = Image.open(requests.get(img['src'], stream=True).raw)
        print(img['src'], imgsize.size)

它对一些 url 的人来说工作正常,但对其他人我得到以下错误:

PIL.UnidentifiedImageError: 无法识别图像文件 <_io.BytesIO 对象在 0x10782e900>

有没有办法克服这个错误?

没有你的具体 url,我无法去看看为什么会这样。但是你可以在那里放一个 try/except 这样你的脚本就不会崩溃并且会继续到下一个 img

from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
from PIL import Image
import requests

url="https://example.com/"

session = requests.Session()
r = session.get(url)

b  = requests.get(url)
soup = BeautifulSoup(b.text, "lxml")

images = soup.find_all('img')

for img in images:
    if img.has_attr('src') :
        try:
            img_link = img['src']
            if img_link.startswith('data:image'):
                img_link = img['data-src']
            imgsize = Image.open(requests.get(img_link, stream=True).raw)
            print(img_link, imgsize.size)
        
        except Exception as e:
            print(e)