Python 爬虫验证图片

Question

我想爬取验证图片，我是这样用curl实现的

curl "https://www.ris.gov.tw/apply/captcha/image?CAPTCHA_KEY=71cc3b094e824446873038401ab8c303&time=1464968502855" -H "Referer: https://www.ris.gov.tw/id_card/" --insecure >> a.jpg

P.S（每次都需要创建CAPTCHA_KEY和time）

运行良好，将验证图片保存到a.jpg。

现在我试图在 python 中重写，这就是我所做的。

import requests
from bs4 import BeautifulSoup
from datetime import datetime
import shutil
import time
from IPython.display import Image
from random import randint

ori = requests.get("https://www.ris.gov.tw/id_card/")
soup = BeautifulSoup(ori.text)
key =  soup.select('#captchaKey')[0]["value"]
#Get CAPTCHA_KEY 
rs = requests.session()
url = "https://www.ris.gov.tw/apply/captcha/image?CAPTCHA_KEY=" + key
time =  str(int((time.time())*100)) + str(randint(0,9))
url += "&time=" + time
#Get time 

res = rs.get(url, headers={'referer': 'https://www.ris.gov.tw/id_card/'}, stream = True, verify =False)

f= open('check.jpg','wb')
shutil.copyfileobj(res.raw,f)
f.close()
Image('check.jpg')

卡了一会儿，不知道怎么弄。

Answer 1

这些更改为我提供了 JPEG 文件：

res = rs.get(url, headers={'referer': 'https://www.ris.gov.tw/id_card/'})
with open('check.jpg', 'wb') as jpeg_file:
    jpeg_file.write(res.content)

content是可以直接写入文件的字节响应。

Python 爬虫验证图片

Python crawler verification picture

python

curl

web-crawler