Python 请求 -- MemoryError 尽管使用流式上传
Python Requests -- MemoryError despite using streaming uploads
根据 the documentation,应该可以通过给 Request file-like object 而不是文件内容来进行非内存密集型上传。好的,所以我在代码中这样做:
files = {'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream', {'Content-Transfer-Encoding':'binary'})}
r = s.post(url, data=content, params=params, files=files, headers=headers)
在我的电脑上观看 运行 时,有一个 2.8 GB 的文件,它开始以惊人的速度消耗内存,直到内存使用率达到约 89% 时才退出。然后失败并显示以下输出:
File "***.py", line 644, in post
r = s.post(url, data=content, params=params, files=files, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
return self.request('POST', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 110, in request
hooks, stream, verify, cert)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 348, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 286, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 289, in prepare
self.prepare_body(data, files)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 426, in prepare_body
(body, content_type) = self._encode_files(files, data)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 144, in _encode_files
body, content_type = encode_multipart_formdata(new_fields)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 101, in encode_multipart_formdata
return body.getvalue(), content_type
MemoryError
它适用于较小的文件,但这样做时仍然会占用大量内存。我是不是误会了什么?
编辑:
看到 Martijn Pieters 的 后,我将代码更改为:
files = {'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream')}
m = requests_toolbelt.MultipartEncoder(fields=files)
headers['content-type'] = m.content_type
r = s.post(url, data=m, params=params, headers=headers)
我不得不删除 {'Content-Transfer-Encoding':'binary'}
,因为它似乎不受支持,并导致此错误消息:
File "***.py", line 647, in post
m = requests_toolbelt.MultipartEncoder(fields=files)
File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 89, in __init__
self._prepare_parts()
File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 171, in _prepare_parts
self.parts = [Part.from_field(f, enc) for f in fields]
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 44, in iter_field_objects
yield RequestField.from_tuples(*field)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/fields.py", line 97, in from_tuples
filename, data = value
ValueError: too many values to unpack
(在使用多部分编码器时,是否仍然可以设置此 header?我更希望它存在。)
但是,即使删除 header,它 仍然 不起作用,因为现在我收到此错误消息:
File "***.py", line 647, in post
r = s.post(url, data=m, params=params, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
return self.request('POST', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 114, in request
main_key = self.cache.create_key(response.request)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/backends/base.py", line 156, in create_key
key.update(_to_bytes(request.body))
TypeError: must be convertible to a buffer, not MultipartEncoder
有什么想法吗?我承认我对此很陌生,而且这些错误消息,就像它们在编程中经常出现的那样,没有什么帮助。
您没有流式传输上传,因为 requests
只有在 整个 body 来自打开的文件 object 时才能做到这一点.它仍然会将所有文件读入内存以构建 multi-part post body.
对于 multi-part 上传,使用 requests toolbelt; it includes a Streaming Multipart Data Encoder:
from requests_toolbelt import MultipartEncoder
import requests
files = {
'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream')
}
m = MultipartEncoder(fields=dict(files, **params))
headers['content-type'] = m.content_type
r = s.post(url, data=m, headers=headers)
r = requests.post('http://httpbin.org/post', data=m, headers=headers)
MultipartEncoder
的第一个参数用 iter_field_objects()
function from the urllib3
library; this means that it can either be a dictionary of key-value pairs, or a sequence (list, tuple) of RequestField()
objects 解析。
像我上面那样传入字典时,每个 key-value 对都用 RequestField.from_tuples()
解析,您只能指定字段名称、值,以及可选的文件名和 mimetype .不支持额外的 header。我在上面的示例中使用了该选项。
如果要将Content-Transfer-Encoding
header添加到file
字段,那么我们需要使用RequestField
object的序列:
from requests.packages.urllib3.fields import RequestField
fields = [RequestField.from_tuples(*p) for p in params.iteritems()]
fields.extend([
RequestField('md5', md5hash),
RequestField('modified', now),
RequestField('created', now),
RequestField(
'file', fileobject, 'application/octet-stream',
{'Content-Transfer-Encoding':'binary'}),
])
请注意,您不能将流媒体请求与 request-cache project 结合使用;后者需要访问完整的 body 请求以生成缓存密钥。
你必须修补 requests_cache.backends.base.BaseCache.create_key
方法来处理 MultipartEncoder
object 并为 body 想出某种 hash-key .然而,这超出了这个问题的范围。
上传文件方式的简单改变
with open('massive-body', 'rb') 作为 f:
requests.post('http://some.url/streamed', 数据=f)
有帮助
根据 the documentation,应该可以通过给 Request file-like object 而不是文件内容来进行非内存密集型上传。好的,所以我在代码中这样做:
files = {'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream', {'Content-Transfer-Encoding':'binary'})}
r = s.post(url, data=content, params=params, files=files, headers=headers)
在我的电脑上观看 运行 时,有一个 2.8 GB 的文件,它开始以惊人的速度消耗内存,直到内存使用率达到约 89% 时才退出。然后失败并显示以下输出:
File "***.py", line 644, in post
r = s.post(url, data=content, params=params, files=files, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
return self.request('POST', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 110, in request
hooks, stream, verify, cert)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 348, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 286, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 289, in prepare
self.prepare_body(data, files)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 426, in prepare_body
(body, content_type) = self._encode_files(files, data)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 144, in _encode_files
body, content_type = encode_multipart_formdata(new_fields)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 101, in encode_multipart_formdata
return body.getvalue(), content_type
MemoryError
它适用于较小的文件,但这样做时仍然会占用大量内存。我是不是误会了什么?
编辑:
看到 Martijn Pieters 的
files = {'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream')}
m = requests_toolbelt.MultipartEncoder(fields=files)
headers['content-type'] = m.content_type
r = s.post(url, data=m, params=params, headers=headers)
我不得不删除 {'Content-Transfer-Encoding':'binary'}
,因为它似乎不受支持,并导致此错误消息:
File "***.py", line 647, in post
m = requests_toolbelt.MultipartEncoder(fields=files)
File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 89, in __init__
self._prepare_parts()
File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 171, in _prepare_parts
self.parts = [Part.from_field(f, enc) for f in fields]
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 44, in iter_field_objects
yield RequestField.from_tuples(*field)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/fields.py", line 97, in from_tuples
filename, data = value
ValueError: too many values to unpack
(在使用多部分编码器时,是否仍然可以设置此 header?我更希望它存在。)
但是,即使删除 header,它 仍然 不起作用,因为现在我收到此错误消息:
File "***.py", line 647, in post
r = s.post(url, data=m, params=params, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
return self.request('POST', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 114, in request
main_key = self.cache.create_key(response.request)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/backends/base.py", line 156, in create_key
key.update(_to_bytes(request.body))
TypeError: must be convertible to a buffer, not MultipartEncoder
有什么想法吗?我承认我对此很陌生,而且这些错误消息,就像它们在编程中经常出现的那样,没有什么帮助。
您没有流式传输上传,因为 requests
只有在 整个 body 来自打开的文件 object 时才能做到这一点.它仍然会将所有文件读入内存以构建 multi-part post body.
对于 multi-part 上传,使用 requests toolbelt; it includes a Streaming Multipart Data Encoder:
from requests_toolbelt import MultipartEncoder
import requests
files = {
'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream')
}
m = MultipartEncoder(fields=dict(files, **params))
headers['content-type'] = m.content_type
r = s.post(url, data=m, headers=headers)
r = requests.post('http://httpbin.org/post', data=m, headers=headers)
MultipartEncoder
的第一个参数用 iter_field_objects()
function from the urllib3
library; this means that it can either be a dictionary of key-value pairs, or a sequence (list, tuple) of RequestField()
objects 解析。
像我上面那样传入字典时,每个 key-value 对都用 RequestField.from_tuples()
解析,您只能指定字段名称、值,以及可选的文件名和 mimetype .不支持额外的 header。我在上面的示例中使用了该选项。
如果要将Content-Transfer-Encoding
header添加到file
字段,那么我们需要使用RequestField
object的序列:
from requests.packages.urllib3.fields import RequestField
fields = [RequestField.from_tuples(*p) for p in params.iteritems()]
fields.extend([
RequestField('md5', md5hash),
RequestField('modified', now),
RequestField('created', now),
RequestField(
'file', fileobject, 'application/octet-stream',
{'Content-Transfer-Encoding':'binary'}),
])
请注意,您不能将流媒体请求与 request-cache project 结合使用;后者需要访问完整的 body 请求以生成缓存密钥。
你必须修补 requests_cache.backends.base.BaseCache.create_key
方法来处理 MultipartEncoder
object 并为 body 想出某种 hash-key .然而,这超出了这个问题的范围。
上传文件方式的简单改变
with open('massive-body', 'rb') 作为 f: requests.post('http://some.url/streamed', 数据=f)
有帮助