Python 请求 -- MemoryError 尽管使用流式上传

Python Requests -- MemoryError despite using streaming uploads

根据 the documentation,应该可以通过给 Request file-like object 而不是文件内容来进行非内存密集型上传。好的,所以我在代码中这样做:

files = {'md5': ('', md5hash),
         'modified': ('', now),
         'created': ('', now),
         'file': (os.path.basename(url), fileobject, 'application/octet-stream', {'Content-Transfer-Encoding':'binary'})}
r = s.post(url, data=content, params=params, files=files, headers=headers)

在我的电脑上观看 运行 时,有一个 2.8 GB 的文件,它开始以惊人的速度消耗内存,直到内存使用率达到约 89% 时才退出。然后失败并显示以下输出:

  File "***.py", line 644, in post
    r = s.post(url, data=content, params=params, files=files, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
    return self.request('POST', url, data=data, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 110, in request
    hooks, stream, verify, cert)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 348, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 286, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 289, in prepare
    self.prepare_body(data, files)
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 426, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 144, in _encode_files
    body, content_type = encode_multipart_formdata(new_fields)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 101, in encode_multipart_formdata
    return body.getvalue(), content_type
MemoryError

它适用于较小的文件,但这样做时仍然会占用大量内存。我是不是误会了什么?

编辑:

看到 Martijn Pieters 的 后,我将代码更改为:

    files = {'md5': ('', md5hash),
             'modified': ('', now),
             'created': ('', now),
             'file': (os.path.basename(url), fileobject, 'application/octet-stream')}
    m = requests_toolbelt.MultipartEncoder(fields=files)
    headers['content-type'] = m.content_type
    r = s.post(url, data=m, params=params, headers=headers)

我不得不删除 {'Content-Transfer-Encoding':'binary'},因为它似乎不受支持,并导致此错误消息:

  File "***.py", line 647, in post
    m = requests_toolbelt.MultipartEncoder(fields=files)
  File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 89, in __init__
    self._prepare_parts()
  File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 171, in _prepare_parts
    self.parts = [Part.from_field(f, enc) for f in fields]
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 44, in iter_field_objects
    yield RequestField.from_tuples(*field)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/fields.py", line 97, in from_tuples
filename, data = value
ValueError: too many values to unpack

(在使用多部分编码器时,是否仍然可以设置此 header?我更希望它存在。)

但是,即使删除 header,它 仍然 不起作用,因为现在我收到此错误消息:

  File "***.py", line 647, in post
    r = s.post(url, data=m, params=params, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
    return self.request('POST', url, data=data, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 114, in request
    main_key = self.cache.create_key(response.request)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/backends/base.py", line 156, in create_key
    key.update(_to_bytes(request.body))
TypeError: must be convertible to a buffer, not MultipartEncoder

有什么想法吗?我承认我对此很陌生,而且这些错误消息,就像它们在编程中经常出现的那样,没有什么帮助。

您没有流式传输上传,因为 requests 只有在 整个 body 来自打开的文件 object 时才能做到这一点.它仍然会将所有文件读入内存以构建 multi-part post body.

对于 multi-part 上传,使用 requests toolbelt; it includes a Streaming Multipart Data Encoder:

from requests_toolbelt import MultipartEncoder
import requests

files = {
    'md5': ('', md5hash),
    'modified': ('', now),
    'created': ('', now),
    'file': (os.path.basename(url), fileobject, 'application/octet-stream')
}
m = MultipartEncoder(fields=dict(files, **params))
headers['content-type'] = m.content_type

r = s.post(url, data=m, headers=headers)
r = requests.post('http://httpbin.org/post', data=m, headers=headers)

MultipartEncoder 的第一个参数用 iter_field_objects() function from the urllib3 library; this means that it can either be a dictionary of key-value pairs, or a sequence (list, tuple) of RequestField() objects 解析。

像我上面那样传入字典时,每个 key-value 对都用 RequestField.from_tuples() 解析,您只能指定字段名称、值,以及可选的文件名和 mimetype .不支持额外的 header。我在上面的示例中使用了该选项。

如果要将Content-Transfer-Encodingheader添加到file字段,那么我们需要使用RequestFieldobject的序列:

from requests.packages.urllib3.fields import RequestField

fields = [RequestField.from_tuples(*p) for p in params.iteritems()]
fields.extend([
    RequestField('md5', md5hash),
    RequestField('modified', now),
    RequestField('created', now),
    RequestField(
        'file', fileobject, 'application/octet-stream',
        {'Content-Transfer-Encoding':'binary'}),
])

请注意,您不能将流媒体请求与 request-cache project 结合使用;后者需要访问完整的 body 请求以生成缓存密钥。

你必须修补 requests_cache.backends.base.BaseCache.create_key 方法来处理 MultipartEncoder object 并为 body 想出某种 hash-key .然而,这超出了这个问题的范围。

上传文件方式的简单改变

with open('massive-body', 'rb') 作为 f: requests.post('http://some.url/streamed', 数据=f)

有帮助