gunicorn 线程被悄悄杀死

gunicorn threads getting killed silently

gunicorn 版本 19.9.0

获得以下 gunicorn 配置:

accesslog = "access.log"
worker_class = 'sync'
workers = 1
worker_connections = 1000
timeout = 300
graceful_timeout = 300
keepalive = 300
proc_name = 'server'
bind = '0.0.0.0:8080'
name = 'server.py'
preload = True
log_level = "info"

threads = 7
max_requests = 0
backlog = 100

如您所见,服务器配置为 运行 7 个线程。

服务器启动于:

gunicorn -c gunicorn_config.py server:app

这是我们日志文件开头的行数和线程 ID(最后一行是主服务器的线程):

  10502 140625414080256
  10037 140624842843904
   9995 140624859629312
   9555 140625430865664
   9526 140624851236608
   9409 140625405687552
   2782 140625422472960
      6 140628359804736

所以有 7 个线程正在处理请求。 (我们已经可以看到线程 140625422472960 处理的请求比其他线程少得多。)

但是在上面检查的行之后,线程 140625422472960 就消失了,日志文件只有:

  19602 140624859629312
  18861 140625405687552
  18766 140624851236608
  18765 140624842843904
  12523 140625414080256
   2111 140625430865664
  (excluding the main thread here)

从服务器日志中我们可以看到线程收到一个请求并开始处理它,但从未完成。客户端也没有收到任何响应。

日志文件和 stderr 中都没有 error/warning。

并且 运行将应用程序运行一段时间后,又有两个线程消失了:

102 140624842843904
102 140624851236608
 68 140624859629312
 85 140625405687552

如何调试?

进一步深入研究 stderr 日志,最终发现类似以下异常堆栈跟踪的内容:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
[2018-11-04 17:57:55 +0330] [31] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected

这是由于gunicorn bug

修复此错误之前的临时解决方案是像 asantoni 所做的猴子补丁 gunicorn。