当 gunicorn worker 超时时是否有可能获得堆栈跟踪?

Is it possible to get a stack trace when a gunicorn worker hits a timeout?

具体来说,我是 运行 一个在 gunicorn 中默认工作器的 Flask 应用程序。我试图弄清楚如何调试/跟踪当工作人员在服务请求时因超时而被杀死时发生的事情。有没有办法获取堆栈跟踪或分析请求以对此进行调试?

老实说,这是一个非常好的问题,我不确定是否有可能达到同样的效果。但是当我开始挖掘时,我发现了很多有趣的话题

Showing the stack trace from a running Python application

Get stacktrace from stuck python process

https://github.com/khamidou/lptrace

https://gist.github.com/reywood/e221c4061bbf2eccea885c9b2e4ef496

首先我用下面的代码创建了一个简单的 flask 应用程序

app.py

from flask import Flask
app = Flask(__name__)
import time

def a():
  b()

def b():
  c ()

def c():
   i = 0
   while i < 900:
       time.sleep(1)
       i += 1

@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def catch_all(path):
    a()
    return 'You want path: %s' % path

if __name__ == '__main__':
    app.run()

wsgi.py

from app import app

if __name__ == "__main__":
    app.run()

现在 运行 应用如下所示并执行 curl localhost:8000/abc

$ gunicorn wsgi:app
[2019-08-01 08:19:06 +0000] [26825] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:19:06 +0000] [26825] [INFO] Listening at: http://127.0.0.1:8000 (26825)
[2019-08-01 08:19:06 +0000] [26825] [INFO] Using worker: sync
[2019-08-01 08:19:06 +0000] [26828] [INFO] Booting worker with pid: 26828
[2019-08-01 08:19:40 +0000] [26825] [CRITICAL] WORKER TIMEOUT (pid:26828)
[2019-08-01 08:19:40 +0000] [26828] [INFO] Worker exiting (pid: 26828)
[2019-08-01 08:19:40 +0000] [26832] [INFO] Booting worker with pid: 26832

现在我们需要的是一个钩子,可以在被杀死之前调用。 gunicorn 支持配置文件中的服务器事件

现在我们创建一个配置文件

gunicorn_config.py

timeout = 3

def worker_abort(worker):
    pid = worker.pid
    print("worker is being killed - {}".format(pid))

现在我们的输出是

$ gunicorn -c gunicorn_config.py wsgi:app
[2019-08-01 08:22:17 +0000] [26837] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:22:17 +0000] [26837] [INFO] Listening at: http://127.0.0.1:8000 (26837)
[2019-08-01 08:22:17 +0000] [26837] [INFO] Using worker: sync
[2019-08-01 08:22:17 +0000] [26840] [INFO] Booting worker with pid: 26840
[2019-08-01 08:22:22 +0000] [26837] [CRITICAL] WORKER TIMEOUT (pid:26840)
worker is being killed - 26840
[2019-08-01 08:22:22 +0000] [26840] [INFO] Worker exiting (pid: 26840)
[2019-08-01 08:22:22 +0000] [26844] [INFO] Booting worker with pid: 26844

这很好,现在我们需要混合我们之前关于 pyrasite 的知识和这个来获得堆栈。所以我们像下面这样更新了配置文件

gunicorn_config.py

timeout = 3

__code_dump_stack__ = """
import sys, traceback

for thread, frame in sys._current_frames().items():
    print('Thread 0x%x' % thread)
    traceback.print_stack(frame)
    print()
"""

def dump_stack_for_process(pid):
    import pyrasite

    ipc = pyrasite.PyrasiteIPC(pid)
    ipc.connect()
    print(ipc.cmd(__code_dump_stack__))
    ipc.close()

def worker_abort(worker):
    pid = worker.pid
    print("worker is being killed - {}".format(pid))
    dump_stack_for_process(pid)

现在我们的输出是

$ [2019-08-01 08:25:29 +0000] [26848] [INFO] Starting gunicorn 19.9.0
[2019-08-01 08:25:29 +0000] [26848] [INFO] Listening at: http://127.0.0.1:8000 (26848)
[2019-08-01 08:25:29 +0000] [26848] [INFO] Using worker: sync
[2019-08-01 08:25:29 +0000] [26851] [INFO] Booting worker with pid: 26851
[2019-08-01 08:25:38 +0000] [26848] [CRITICAL] WORKER TIMEOUT (pid:26851)
worker is being killed - 26851
Thread 0x7ff0a7a4b700
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "<string>", line 72, in run
  File "<string>", line 92, in on_command
  File "<string>", line 6, in <module>

Thread 0x7ff0ac512700
  File "/home/vagrant/.local/bin/gunicorn", line 11, in <module>
    sys.exit(run())
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 61, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 223, in run
    super(Application, self).run()
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 203, in run
    self.manage_workers()
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 545, in manage_workers
    self.spawn_workers()
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 616, in spawn_workers
    self.spawn_worker()
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.run()
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 124, in run
    self.run_for_one(timeout)
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 68, in run_for_one
    self.accept(listener)
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 30, in accept
    self.handle(listener, client, addr)
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2463, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/vagrant/remotedebug/app.py", line 20, in catch_all
    a()
  File "/home/vagrant/remotedebug/app.py", line 6, in a
    b()
  File "/home/vagrant/remotedebug/app.py", line 9, in b
    c ()
  File "/home/vagrant/remotedebug/app.py", line 14, in c
    time.sleep(1)
  File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 195, in handle_abort
    self.cfg.worker_abort(self)
  File "gunicorn_config.py", line 23, in worker_abort
    dump_stack_for_process(pid)
  File "gunicorn_config.py", line 17, in dump_stack_for_process
    print(ipc.cmd(__code_dump_stack__))
  File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 161, in cmd
    return self.recv()
  File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 174, in recv
    header_data = self.recv_bytes(4)
  File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 187, in recv_bytes
    chunk = self.sock.recv(n - len(data))


[2019-08-01 08:25:38 +0000] [26851] [INFO] Worker exiting (pid: 26851)
[2019-08-01 08:25:38 +0000] [26862] [INFO] Booting worker with pid: 26862

堆栈跟踪很大,但它提供了我们需要的东西

只是想添加到已接受的答案中,还有另一种方法可以通过使用 v3.3 中引入的内置 Python 3 模块 faulthandler 获得类似的结果 https://docs.python.org/dev/library/faulthandler.html

您可以将以下两行添加到接受的答案中提到的wsgi.py

import faulthandler
faulthandler.enable()

发生超时时,以下输出将打印到标准错误:

[2020-12-24 13:38:32 +0000] [31304] [INFO] Starting gunicorn 20.0.4
[2020-12-24 13:38:32 +0000] [31304] [INFO] Listening at: http://0.0.0.0:8888 (31304)
[2020-12-24 13:38:32 +0000] [31304] [INFO] Using worker: sync
[2020-12-24 13:38:32 +0000] [31307] [INFO] Booting worker with pid: 31307
[2020-12-24 13:38:55 +0000] [31304] [CRITICAL] WORKER TIMEOUT (pid:31307)
Fatal Python error: Aborted

Current thread 0x00007f411d781700 (most recent call first):
  File "/tmp/app.py", line 14 in c
  File "/tmp/app.py", line 9 in b
  File "/tmp/app.py", line 6 in a
  File "/tmp/app.py", line 20 in catch_all
  File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 1936 in dispatch_request
  File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 1950 in full_dispatch_request
  File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 2447 in wsgi_app
  File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 2464 in __call__
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 175 in handle_request
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 134 in handle
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 29 in accept
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 67 in run_for_one
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 123 in run
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/base.py", line 140 in init_process
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 583 in spawn_worker
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 616 in spawn_workers
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 545 in manage_workers
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 202 in run
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/base.py", line 72 in run
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/base.py", line 228 in run
  File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 58 in run
  File "/tmp/venv/bin/gunicorn", line 8 in <module>
[2020-12-24 13:38:55 +0000] [31307] [INFO] Worker exiting (pid: 31307)
[2020-12-24 13:38:55 +0000] [31503] [INFO] Booting worker with pid: 31503