如何检查 URL 是否可在请求中下载
How to check if a URL is downloadable in requests
我正在使用 tkinter 和 requests 制作这个下载器应用程序,我最近在我的程序中发现了一个错误。基本上我希望我的程序在开始下载 URL 的内容之前检查给定的 URL 是否可下载。我曾经通过获取 URL 的 headers 并检查 'Content-Length' 是否存在 来做到这一点,它适用于某些 URLs (如:https://www.google.com)但对于其他人(如 youtube 视频的 link)它没有,它使我的程序崩溃。我看到有人说我可以在 headers 的 'Content-Disposition' 中检查 'attachment' 但它对我不起作用并返回对于可下载的和 non-downloadable URL 也是一样的。做这个的最好方式是什么?
我试过但没有用的另一个Whosebug问题中提到的代码:
import requests
url = 'https://www.google.com'
headers=requests.head(url).headers
downloadable = 'attachment' in headers.get('Content-Disposition', '')
我以前的代码:
headers = requests.head(url, headers={'accept-encoding': ''}).headers
try:
print(type(headers['Content-Length']))
file_size = int(headers['Content-Length'])
except KeyError:
# Just a class that I defined to raise an exception if the URL was not downloadable
raise NotDownloadable()
更新: URL:
https://aspb1.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-360p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6IjUzMGU0Mzc3ZjRlZjVlYWU0OTFkMzdiOTZkODgwNGQ2IiwiZXhwIjoxNjExMzMzMDQxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.FjMi_dkdLCUkt25dfGqPLcehpaC32dBBUNDC9cLNiu0
这个URL是我用来测试的。如果你打开 URL 它会直接引导你到一个你可以下载的视频但是当检查 'Content-Disposition' 它返回 'None' 就像大多数可下载的和 non-downloadable URL我试过了。
如果 url 中未提供,Content-Disposition 会提供文件名信息。但此信息并不总是存在,就像您的 url 一样。一种解决方案是按内容类型过滤,请参见下面的示例。如果您希望下载特定内容类型,例如 video/mp4
.
,您可以添加过滤器
import requests
url = 'https://aspb1.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-360p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6IjUzMGU0Mzc3ZjRlZjVlYWU0OTFkMzdiOTZkODgwNGQ2IiwiZXhwIjoxNjExMzMzMDQxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.FjMi_dkdLCUkt25dfGqPLcehpaC32dBBUNDC9cLNiu0'
headers=requests.head(url, allow_redirects=True).headers
content_type = headers.get('content-type')
if 'text' in content_type.lower():
downloadable = False
elif 'html' in content_type.lower():
downloadable = False
else:
downloadable = True
print(downloadable)
根据Request for Comment (RFC) 6266Content-DispositionHeader字段:
is not part of the HTTP standard, but since it is widely implemented,
we are documenting its use and risks for implementers.
由于 Content-Disposition header 并不总是可用,您可以使用一种解决方案,它不仅可以查找特定的 header,还可以查看文件中的各个文件类型Content-Type header
这是 Content-Types 的列表。
下面的代码检查 headers 是否有 Content-Disposition,但它也会检查 headers 是否有一些 Content-Type 通常可以下载。
我还添加了对 Content-Length、 的检查,因为它可能有助于对正在下载的文件进行分块。
您是否考虑过创建 sub-download 个文件夹?
- download_folder/text_files
- download_folder/pdf_files
或
- download_folder/01242021/text_files
- download_folder/01242021/pdf_files
import requests
urls = ['https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2019-financial'
'-year-provisional/Download-data/annual-enterprise-survey-2019-financial-year-provisional-csv.csv',
'http://www.pdf995.com/samples/pdf.pdf', 'https://jeroen.github.io/files/sample.rtf',
'https://www.cnn.com/2021/01/23/opinions/biden-climate-change-gillette-wyoming-coal-sutter/index.html',
'https://www.google.com',
'https://thumbs-prod.si-cdn.com/d4e3zqOM5KUq8m0m-AFVxuqa5ZM=/800x600/filters:no_upscale():focal(554x699:555x700)/https://public-media.si-cdn.com/filer/a4/04/a404c799-7118-459a-8de4-89e4a44b124f/img_1317.jpg',
'https://www.blank.org']
for url in urls:
headers = requests.head(url).headers
Content_Length = [value for key, value in headers.items() if key == 'Content-Length']
if len(Content_Length) > 0:
Content_Size = ''.join(map(str, Content_Length))
else:
Content_Size = 'The content size was not available.'
Content_Disposition_Exists = bool({key: value for key, value in headers.items() if key == 'Content_Disposition'})
if Content_Disposition_Exists is True:
# do something with the file
pass
else:
Content_Type = {value for key, value in headers.items() if key == 'Content-Type'}
compression_formats = ['application/gzip', 'application/vnd.rar', 'application/x-7z-compressed',
'application/zip', 'application/x-tar']
compressed_file = bool([file_format for file_format in compression_formats if file_format in Content_Type])
image_formats = ['image/bmp', 'image/gif', 'image/jpeg', 'image/png', 'image/svg+xml', 'image/tiff',
'image/webp']
image_file = bool([file_format for file_format in image_formats if file_format in Content_Type])
text_formats = ['application/rtf', 'text/plain']
text_file = bool([file_format for file_format in text_formats if file_format in Content_Type])
if compressed_file is True:
print('Compressed file')
print(Content_Size)
elif image_file is True:
print('Image file')
print(Content_Size)
elif text_file is True:
print('Text file')
print(Content_Size)
elif 'application/pdf' in Content_Type:
print('PDF file')
print(Content_Size)
elif 'text/csv' in Content_Type:
print('CSV File')
print(Content_Size)
这是另一个带有函数的版本
import requests
urls = ['https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2019-financial'
'-year-provisional/Download-data/annual-enterprise-survey-2019-financial-year-provisional-csv.csv',
'http://www.pdf995.com/samples/pdf.pdf', 'https://jeroen.github.io/files/sample.rtf',
'https://www.cnn.com/2021/01/23/opinions/biden-climate-change-gillette-wyoming-coal-sutter/index.html',
'https://www.google.com',
'https://thumbs-prod.si-cdn.com/d4e3zqOM5KUq8m0m-AFVxuqa5ZM=/800x600/filters:no_upscale():focal(554x699:555x700)/https://public-media.si-cdn.com/filer/a4/04/a404c799-7118-459a-8de4-89e4a44b124f/img_1317.jpg',
'https://www.blank.org']
def query_headers(webpage):
response = requests.get(webpage, stream=True)
headers = response.headers
file_name = webpage.rsplit('/', 1)[-1]
Content_Disposition_Exists = bool({key: value for key, value in headers.items() if key == 'Content_Disposition'})
if Content_Disposition_Exists is True:
# do something with the file
pass
else:
Content_Type = {value for key, value in headers.items() if key == 'Content-Type'}
compression_formats = ['application/gzip', 'application/vnd.rar', 'application/x-7z-compressed',
'application/zip', 'application/x-tar']
compressed_file = bool([file_format for file_format in compression_formats if file_format in Content_Type])
image_formats = ['image/bmp', 'image/gif', 'image/jpeg', 'image/png', 'image/svg+xml', 'image/tiff',
'image/webp']
image_file = bool([file_format for file_format in image_formats if file_format in Content_Type])
text_formats = ['application/rtf', 'text/plain']
text_file = bool([file_format for file_format in text_formats if file_format in Content_Type])
nl = '\n'
if compressed_file is True:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: Compressed file, File size: {content_size}, File name: {file_name}'
elif image_file is True:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: Image file, File size: {content_size}, File name: {file_name}'
elif text_file is True:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: Text file, File size: {content_size}, File name: {file_name}'
elif 'application/pdf' in Content_Type:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: PDF file, File size: {content_size}, File name: {file_name}'
elif 'text/csv' in Content_Type:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: CSV file, File size: {content_size}, File name: {file_name}'
elif 'text/html' in "".join(str(Content_Type)):
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: HTML file, File size: {content_size}, File name: {file_name}'
else:
content_size = get_content_size(headers)
return f'File Information: file_type: no file type found, File size: {content_size}, File name: {file_name}'
def get_content_size(headers):
Content_Length = [value for key, value in headers.items() if key == 'Content-Length']
if len(Content_Length) > 0:
Content_Size = ''.join(map(str, Content_Length))
return int(Content_Size)
else:
return 0
def download_file(filename, file_stream):
with open(f'{filename}', 'wb') as f:
f.write(file_stream.content)
for url in urls:
download_info = query_headers(url)
print(download_info)
# output
File Information: file_type: CSV file, File size: 253178, File name: annual-enterprise-survey-2019-financial-year-provisional-csv.csv
File Information: file_type: PDF file, File size: 433994, File name: pdf.pdf
File Information: file_type: Text file, File size: 9636, File name: sample.rtf
File Information: file_type: HTML file, File size: 185243, File name: index.html
File Information: file_type: HTML file, File size: 0, File name: www.google.com
File Information: file_type: Image file, File size: 78868, File name: img_1317.jpg
File Information: file_type: HTML file, File size: 170, File name: www.blank.org
您可以查看 content-type response header. This header defines the media type of the requested resource. The most common types are shown here。
content-type
header 由 type "/" subtype
定义,有些还包含一个参数,格式为 type "/" subtype ";" parameter
,参数格式为 attribute "=" value
.参数值不是强制性的,但类型和子类型是。
目前有 7 种类型定义为 RFC 134:
text
multipart
application
message
image
audio
video
您正在寻找的 header 因您期望的资源而异,但您可能会使用一些示例。
例子
下载图片
import requests
response = requests.head(url)
response_headers = response.headers
response_content_type = response_headers.get("content-type")
# you could use this code to search for all images using just the type
if response_content_type.lower().split("/")[0] == "image":
is_image = True
else:
is_image = False
# alternatively you could specify your expected content-types including the subtype
CONTENT_TYPES = ["image/gif", "image/jpeg", "image/png", "image/tiff", "image.svg+xml"...]
if response_content_type.lower() in CONTENT_TYPES:
is_image = True
else:
is_image = False
if is_image:
# code to download image
此代码可以很容易地适应不同的类型和子类型。
备注
值得注意的是,类型是固定的,不能定义新的子类型,但可以定义新的子类型。
我认为您以前的代码可以工作,但需要稍作修改。它正在尝试下载完整的文件,因此每次您 运行
它都会被挂起
import requests
url = 'https://aspb1.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-360p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6IjUzMGU0Mzc3ZjRlZjVlYWU0OTFkMzdiOTZkODgwNGQ2IiwiZXhwIjoxNjExMzMzMDQxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.FjMi_dkdLCUkt25dfGqPLcehpaC32dBBUNDC9cLNiu0'
r = requests.get(url,stream=True)
try:
print(r.headers)
#if "Content-Length" in r.headers:
file_size = int(r.headers["Content-Length"])
except KeyError:
# Just a class that I defined to raise an exception if the URL was not downloadable
raise NotDownloadable()
使用stream=True
r = requests.get(url,stream=True)
这在用户文档中没有解释。但是通过猜测我们可以说,分块传输编码正在进行中,数据流被分成一系列不重叠的“块”。数据块由服务器独立发送。
我正在使用 tkinter 和 requests 制作这个下载器应用程序,我最近在我的程序中发现了一个错误。基本上我希望我的程序在开始下载 URL 的内容之前检查给定的 URL 是否可下载。我曾经通过获取 URL 的 headers 并检查 'Content-Length' 是否存在 来做到这一点,它适用于某些 URLs (如:https://www.google.com)但对于其他人(如 youtube 视频的 link)它没有,它使我的程序崩溃。我看到有人说我可以在 headers 的 'Content-Disposition' 中检查 'attachment' 但它对我不起作用并返回对于可下载的和 non-downloadable URL 也是一样的。做这个的最好方式是什么? 我试过但没有用的另一个Whosebug问题中提到的代码:
import requests
url = 'https://www.google.com'
headers=requests.head(url).headers
downloadable = 'attachment' in headers.get('Content-Disposition', '')
我以前的代码:
headers = requests.head(url, headers={'accept-encoding': ''}).headers
try:
print(type(headers['Content-Length']))
file_size = int(headers['Content-Length'])
except KeyError:
# Just a class that I defined to raise an exception if the URL was not downloadable
raise NotDownloadable()
更新: URL: https://aspb1.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-360p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6IjUzMGU0Mzc3ZjRlZjVlYWU0OTFkMzdiOTZkODgwNGQ2IiwiZXhwIjoxNjExMzMzMDQxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.FjMi_dkdLCUkt25dfGqPLcehpaC32dBBUNDC9cLNiu0 这个URL是我用来测试的。如果你打开 URL 它会直接引导你到一个你可以下载的视频但是当检查 'Content-Disposition' 它返回 'None' 就像大多数可下载的和 non-downloadable URL我试过了。
如果 url 中未提供,Content-Disposition 会提供文件名信息。但此信息并不总是存在,就像您的 url 一样。一种解决方案是按内容类型过滤,请参见下面的示例。如果您希望下载特定内容类型,例如 video/mp4
.
import requests
url = 'https://aspb1.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-360p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6IjUzMGU0Mzc3ZjRlZjVlYWU0OTFkMzdiOTZkODgwNGQ2IiwiZXhwIjoxNjExMzMzMDQxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.FjMi_dkdLCUkt25dfGqPLcehpaC32dBBUNDC9cLNiu0'
headers=requests.head(url, allow_redirects=True).headers
content_type = headers.get('content-type')
if 'text' in content_type.lower():
downloadable = False
elif 'html' in content_type.lower():
downloadable = False
else:
downloadable = True
print(downloadable)
根据Request for Comment (RFC) 6266Content-DispositionHeader字段:
is not part of the HTTP standard, but since it is widely implemented, we are documenting its use and risks for implementers.
由于 Content-Disposition header 并不总是可用,您可以使用一种解决方案,它不仅可以查找特定的 header,还可以查看文件中的各个文件类型Content-Type header
这是 Content-Types 的列表。
下面的代码检查 headers 是否有 Content-Disposition,但它也会检查 headers 是否有一些 Content-Type 通常可以下载。
我还添加了对 Content-Length、 的检查,因为它可能有助于对正在下载的文件进行分块。
您是否考虑过创建 sub-download 个文件夹?
- download_folder/text_files
- download_folder/pdf_files
或
- download_folder/01242021/text_files
- download_folder/01242021/pdf_files
import requests
urls = ['https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2019-financial'
'-year-provisional/Download-data/annual-enterprise-survey-2019-financial-year-provisional-csv.csv',
'http://www.pdf995.com/samples/pdf.pdf', 'https://jeroen.github.io/files/sample.rtf',
'https://www.cnn.com/2021/01/23/opinions/biden-climate-change-gillette-wyoming-coal-sutter/index.html',
'https://www.google.com',
'https://thumbs-prod.si-cdn.com/d4e3zqOM5KUq8m0m-AFVxuqa5ZM=/800x600/filters:no_upscale():focal(554x699:555x700)/https://public-media.si-cdn.com/filer/a4/04/a404c799-7118-459a-8de4-89e4a44b124f/img_1317.jpg',
'https://www.blank.org']
for url in urls:
headers = requests.head(url).headers
Content_Length = [value for key, value in headers.items() if key == 'Content-Length']
if len(Content_Length) > 0:
Content_Size = ''.join(map(str, Content_Length))
else:
Content_Size = 'The content size was not available.'
Content_Disposition_Exists = bool({key: value for key, value in headers.items() if key == 'Content_Disposition'})
if Content_Disposition_Exists is True:
# do something with the file
pass
else:
Content_Type = {value for key, value in headers.items() if key == 'Content-Type'}
compression_formats = ['application/gzip', 'application/vnd.rar', 'application/x-7z-compressed',
'application/zip', 'application/x-tar']
compressed_file = bool([file_format for file_format in compression_formats if file_format in Content_Type])
image_formats = ['image/bmp', 'image/gif', 'image/jpeg', 'image/png', 'image/svg+xml', 'image/tiff',
'image/webp']
image_file = bool([file_format for file_format in image_formats if file_format in Content_Type])
text_formats = ['application/rtf', 'text/plain']
text_file = bool([file_format for file_format in text_formats if file_format in Content_Type])
if compressed_file is True:
print('Compressed file')
print(Content_Size)
elif image_file is True:
print('Image file')
print(Content_Size)
elif text_file is True:
print('Text file')
print(Content_Size)
elif 'application/pdf' in Content_Type:
print('PDF file')
print(Content_Size)
elif 'text/csv' in Content_Type:
print('CSV File')
print(Content_Size)
这是另一个带有函数的版本
import requests
urls = ['https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2019-financial'
'-year-provisional/Download-data/annual-enterprise-survey-2019-financial-year-provisional-csv.csv',
'http://www.pdf995.com/samples/pdf.pdf', 'https://jeroen.github.io/files/sample.rtf',
'https://www.cnn.com/2021/01/23/opinions/biden-climate-change-gillette-wyoming-coal-sutter/index.html',
'https://www.google.com',
'https://thumbs-prod.si-cdn.com/d4e3zqOM5KUq8m0m-AFVxuqa5ZM=/800x600/filters:no_upscale():focal(554x699:555x700)/https://public-media.si-cdn.com/filer/a4/04/a404c799-7118-459a-8de4-89e4a44b124f/img_1317.jpg',
'https://www.blank.org']
def query_headers(webpage):
response = requests.get(webpage, stream=True)
headers = response.headers
file_name = webpage.rsplit('/', 1)[-1]
Content_Disposition_Exists = bool({key: value for key, value in headers.items() if key == 'Content_Disposition'})
if Content_Disposition_Exists is True:
# do something with the file
pass
else:
Content_Type = {value for key, value in headers.items() if key == 'Content-Type'}
compression_formats = ['application/gzip', 'application/vnd.rar', 'application/x-7z-compressed',
'application/zip', 'application/x-tar']
compressed_file = bool([file_format for file_format in compression_formats if file_format in Content_Type])
image_formats = ['image/bmp', 'image/gif', 'image/jpeg', 'image/png', 'image/svg+xml', 'image/tiff',
'image/webp']
image_file = bool([file_format for file_format in image_formats if file_format in Content_Type])
text_formats = ['application/rtf', 'text/plain']
text_file = bool([file_format for file_format in text_formats if file_format in Content_Type])
nl = '\n'
if compressed_file is True:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: Compressed file, File size: {content_size}, File name: {file_name}'
elif image_file is True:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: Image file, File size: {content_size}, File name: {file_name}'
elif text_file is True:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: Text file, File size: {content_size}, File name: {file_name}'
elif 'application/pdf' in Content_Type:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: PDF file, File size: {content_size}, File name: {file_name}'
elif 'text/csv' in Content_Type:
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: CSV file, File size: {content_size}, File name: {file_name}'
elif 'text/html' in "".join(str(Content_Type)):
download_file(file_name, response)
content_size = get_content_size(headers)
return f'File Information: file_type: HTML file, File size: {content_size}, File name: {file_name}'
else:
content_size = get_content_size(headers)
return f'File Information: file_type: no file type found, File size: {content_size}, File name: {file_name}'
def get_content_size(headers):
Content_Length = [value for key, value in headers.items() if key == 'Content-Length']
if len(Content_Length) > 0:
Content_Size = ''.join(map(str, Content_Length))
return int(Content_Size)
else:
return 0
def download_file(filename, file_stream):
with open(f'{filename}', 'wb') as f:
f.write(file_stream.content)
for url in urls:
download_info = query_headers(url)
print(download_info)
# output
File Information: file_type: CSV file, File size: 253178, File name: annual-enterprise-survey-2019-financial-year-provisional-csv.csv
File Information: file_type: PDF file, File size: 433994, File name: pdf.pdf
File Information: file_type: Text file, File size: 9636, File name: sample.rtf
File Information: file_type: HTML file, File size: 185243, File name: index.html
File Information: file_type: HTML file, File size: 0, File name: www.google.com
File Information: file_type: Image file, File size: 78868, File name: img_1317.jpg
File Information: file_type: HTML file, File size: 170, File name: www.blank.org
您可以查看 content-type response header. This header defines the media type of the requested resource. The most common types are shown here。
content-type
header 由 type "/" subtype
定义,有些还包含一个参数,格式为 type "/" subtype ";" parameter
,参数格式为 attribute "=" value
.参数值不是强制性的,但类型和子类型是。
目前有 7 种类型定义为 RFC 134:
text multipart application message image audio video
您正在寻找的 header 因您期望的资源而异,但您可能会使用一些示例。
例子
下载图片
import requests
response = requests.head(url)
response_headers = response.headers
response_content_type = response_headers.get("content-type")
# you could use this code to search for all images using just the type
if response_content_type.lower().split("/")[0] == "image":
is_image = True
else:
is_image = False
# alternatively you could specify your expected content-types including the subtype
CONTENT_TYPES = ["image/gif", "image/jpeg", "image/png", "image/tiff", "image.svg+xml"...]
if response_content_type.lower() in CONTENT_TYPES:
is_image = True
else:
is_image = False
if is_image:
# code to download image
此代码可以很容易地适应不同的类型和子类型。
备注
值得注意的是,类型是固定的,不能定义新的子类型,但可以定义新的子类型。
我认为您以前的代码可以工作,但需要稍作修改。它正在尝试下载完整的文件,因此每次您 运行
它都会被挂起import requests
url = 'https://aspb1.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-360p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6IjUzMGU0Mzc3ZjRlZjVlYWU0OTFkMzdiOTZkODgwNGQ2IiwiZXhwIjoxNjExMzMzMDQxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.FjMi_dkdLCUkt25dfGqPLcehpaC32dBBUNDC9cLNiu0'
r = requests.get(url,stream=True)
try:
print(r.headers)
#if "Content-Length" in r.headers:
file_size = int(r.headers["Content-Length"])
except KeyError:
# Just a class that I defined to raise an exception if the URL was not downloadable
raise NotDownloadable()
使用stream=True
r = requests.get(url,stream=True)
这在用户文档中没有解释。但是通过猜测我们可以说,分块传输编码正在进行中,数据流被分成一系列不重叠的“块”。数据块由服务器独立发送。