Google ndb 库中的内存泄漏

Memory leak in Google ndb library

我认为 ndb 库中存在内存泄漏,但我找不到位置。

有没有办法避免下面描述的问题?
您是否有更准确的测试思路来找出问题所在?


我就是这样重现问题的:

我创建了一个包含 2 个文件的极简 Google App Engine 应用程序。
app.yaml:

application: myapplicationid
version: demo
runtime: python27
api_version: 1
threadsafe: yes


handlers:
- url: /.*
  script: main.APP

libraries:
- name: webapp2
  version: latest

main.py:

# -*- coding: utf-8 -*-
"""Memory leak demo."""
from google.appengine.ext import ndb
import webapp2


class DummyModel(ndb.Model):

    content = ndb.TextProperty()


class CreatePage(webapp2.RequestHandler):

    def get(self):
        value = str(102**100000)
        entities = (DummyModel(content=value) for _ in xrange(100))
        ndb.put_multi(entities)


class MainPage(webapp2.RequestHandler):

    def get(self):
        """Use of `query().iter()` was suggested here:
            https://code.google.com/p/googleappengine/issues/detail?id=9610
        Same result can be reproduced without decorator and a "classic"
            `query().fetch()`.
        """
        for _ in range(10):
            for entity in DummyModel.query().iter():
                pass # Do whatever you want
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.write('Hello, World!')


APP = webapp2.WSGIApplication([
    ('/', MainPage),
    ('/create', CreatePage),
])

我上传了申请,调用了一次/create
之后,每次调用 / 都会增加实例使用的内存。直到它因错误 Exceeded soft private memory limit of 128 MB with 143 MB after servicing 5 requests total.

而停止

内存使用图示例(您可以看到内存增长和崩溃):

注意:可以使用 webapp2 之外的其他框架重现该问题,例如 web.py

NDB 存在一个已知问题。您可以阅读有关 it here and there is a work around here:

The non-determinism observed with fetch_page is due to the iteration order of eventloop.rpcs, which is passed to datastore_rpc.MultiRpc.wait_any() and apiproxy_stub_map.__check_one selects the last rpc from the iterator.

Fetching with page_size of 10 does an rpc with count=10, limit=11, a standard technique to force the backend to more accurately determine whether there are more results. This returns 10 results, but due to a bug in the way the QueryIterator is unraveled, an RPC is added to fetch the last entry (using obtained cursor and count=1). NDB then returns the batch of entities without processing this RPC. I believe that this RPC will not be evaluated until selected at random (if MultiRpc consumes it before a necessary rpc), since it doesn't block client code.

Workaround: use iter(). This function does not have this issue (count and limit will be the same). iter() can be used as a workaround for the performance and memory issues associated with fetch page caused by the above.

经过更多调查,并在 google 工程师的帮助下,我发现了两个对我的内存消耗的解释。

上下文和线程

ndb.Context 是一个 "thread local" 对象,只有在线程中有新请求时才会被清除。所以线程在请求​​之间保持它。一个 GAE 实例中可能存在许多线程,并且在第二次使用一个线程并清除它的上下文之前可能需要数百个请求。
这不是内存泄漏,但内存中的上下文大小可能会超过小型 GAE 实例中的可用内存。

解决方法:
您不能配置 GAE 实例中使用的线程数。所以最好让每个上下文尽可能小。避免上下文缓存,并在每次请求后清除它。

事件队列

NDB 似乎不保证请求后事件队列清空。同样,这不是内存泄漏。但是它在你的线程上下文中留下 Futures,你又回到了第一个问题。

解决方法:
使用 @ndb.toplevel.

包装所有使用 NDB 的代码

一种可能的解决方法是在 get 方法上使用 context.clear_cache()gc.collect()

def get(self):

    for _ in range(10):
        for entity in DummyModel.query().iter():
            pass # Do whatever you want
    self.response.headers['Content-Type'] = 'text/plain'
    self.response.write('Hello, World!')
    context = ndb.get_context()
    context.clear_cache()
    gc.collect()