芹菜工作线程无限期挂起

celery worker thread hangs indefinitely

我运行我的蜘蛛变成了芹菜工人。蜘蛛抓取一个网站,然后是一堆后续链接。一段时间后,蜘蛛停止进一步处理。

lsof 输出显示对于线程,连接处于 CLOSE_WAIT 状态

# lsof -i -n
COMMAND PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
celery   10 root   32u  IPv4 105621511      0t0  TCP 127.0.0.1:6023 (LISTEN)
celery   10 root   33u  IPv4 105603949      0t0  TCP 10.1.195.250:38162->104.17.38.150:http (ESTABLISHED)
celery   10 root   34u  IPv4 105610494      0t0  TCP 10.1.195.250:41864->185.230.61.195:https (CLOSE_WAIT)
celery   10 root   35u  IPv4 105614120      0t0  TCP 10.1.195.250:39742->185.230.61.195:http (CLOSE_WAIT)
celery   10 root   36u  IPv4 105603950      0t0  TCP 10.1.195.250:52672->185.230.61.96:http (CLOSE_WAIT)
celery   10 root   37u  IPv4 105620542      0t0  TCP 10.1.195.250:38200->209.236.228.178:http (CLOSE_WAIT)
celery   10 root   38u  IPv4 105603948      0t0  TCP 10.1.195.250:51848->35.208.181.87:http (CLOSE_WAIT)
celery   10 root   39u  IPv4 105614124      0t0  TCP 10.1.195.250:56290->185.230.61.96:https (CLOSE_WAIT)
celery   10 root   40u  IPv4 105604983      0t0  TCP 10.1.195.250:43118->216.185.90.112:http (CLOSE_WAIT)
celery   10 root   41u  IPv4 105618465      0t0  TCP 10.1.195.250:55006->209.59.212.167:http (CLOSE_WAIT)
celery   10 root   45u  IPv4 105600888      0t0  TCP 10.1.195.250:34572->23.227.38.74:http (ESTABLISHED)
celery   10 root   46u  IPv4 105620539      0t0  TCP 10.1.195.250:35846->205.178.189.129:http (CLOSE_WAIT)
celery   10 root   48u  IPv4 105620541      0t0  TCP 10.1.195.250:39674->185.230.61.195:http (CLOSE_WAIT)
celery   10 root   49u  IPv4 105610495      0t0  TCP 10.1.195.250:49450->178.128.150.108:http (CLOSE_WAIT)
celery   10 root   51u  IPv4 105614122      0t0  TCP 10.1.195.250:53770->23.227.38.74:https (ESTABLISHED)
celery   10 root   52u  IPv4 105614123      0t0  TCP 10.1.195.250:52930->54.86.91.237:https (CLOSE_WAIT)
celery   10 root   53u  IPv4 105614125      0t0  TCP 10.1.195.250:37998->209.236.228.178:https (CLOSE_WAIT)
celery   10 root   54u  IPv4 105614126      0t0  TCP 10.1.195.250:59992->35.208.181.87:https (CLOSE_WAIT)
celery   10 root   55u  IPv4 105605002      0t0  TCP 10.1.195.250:39692->192.124.249.18:http (CLOSE_WAIT)
celery   10 root   56u  IPv4 105612653      0t0  TCP 10.1.195.250:41912->185.230.61.195:https (CLOSE_WAIT)
celery   10 root   57u  IPv4 105612657      0t0  TCP 10.1.195.250:47560->104.197.82.118:http (CLOSE_WAIT)
celery   10 root   58u  IPv4 105612656      0t0  TCP 10.1.195.250:33926->209.59.212.167:https (CLOSE_WAIT)
celery   10 root   59u  IPv4 105614129      0t0  TCP 10.1.195.250:41614->178.128.150.108:https (CLOSE_WAIT)
celery   10 root   62u  IPv4 105614131      0t0  TCP 10.1.195.250:37534->34.66.87.174:http (CLOSE_WAIT)
celery   10 root   63u  IPv4 105600910      0t0  TCP 10.1.195.250:47682->166.62.115.136:https (CLOSE_WAIT)
celery   10 root   64u  IPv4 105614141      0t0  TCP 10.1.195.250:43222->216.185.90.112:http (CLOSE_WAIT)
celery   10 root   65u  IPv4 105600912      0t0  TCP 10.1.195.250:41060->50.63.7.227:http (CLOSE_WAIT)
celery   10 root   66u  IPv4 105600913      0t0  TCP 10.1.195.250:41254->104.197.82.118:https (CLOSE_WAIT)
celery   10 root   69u  IPv4 105614695      0t0  TCP 10.1.195.250:42766->104.112.162.8:https (ESTABLISHED

ps -aux 显示线程处于睡眠状态并等待事件

# ps -aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.1  0.0  80024 62700 ?        Ss   17:23   0:05 /usr/local/bin/python /usr/local/bin/celery -A data_ex
root         8  0.0  0.0 118892 76360 ?        S    17:23   0:00 /usr/local/bin/python /usr/local/bin/celery -A data_ex
root        10  0.0  0.0 902592 100916 ?       Sl   17:23   0:01 /usr/local/bin/python /usr/local/bin/celery -A data_ex
root       485  0.0  0.0 121900 79376 ?        S    18:07   0:00 /usr/local/bin/python /usr/local/bin/celery -A data_ex
root       486 10.0  0.1 950312 144056 ?       Sl   18:07   1:19 /usr/local/bin/python /usr/local/bin/celery -A data_ex
root       501  0.4  0.0 455868 62432 ?        Sl   18:11   0:02 /usr/local/bin/python /usr/local/bin/celery flower -A 
root       508  0.3  0.0 121916 79388 ?        S    18:17   0:00 /usr/local/bin/python /usr/local/bin/celery -A data_ex
root       509 22.4  0.1 958724 154876 ?       Sl   18:17   0:42 /usr/local/bin/python /usr/local/bin/celery -A data_ex
root       520  0.5  0.0   2388   700 pts/0    Ss   18:20   0:00 /bin/sh
root       526  0.0  0.0   9392  3048 pts/0    R+   18:20   0:00 ps -aux

Starce 显示线程正在等待 fd 69

# strace -p 10  
strace: Process 10 attached
read(69, 

蜘蛛似乎没有正确关闭连接。

我该如何解决这个问题?

这很可能与您用于抓取的代码有关。您可能必须在用于发出 http / https 请求的库上设置超时。