达到MaxClients时的Apache超时/死锁

我使用mod_php(PHP 5.3)在Debian Squeeze上运行Apache 2.2,执行MPM,在多个端口(使用同一台机器运行memcached)上侦听。 在过去的6个星期左右,我遇到了所有Apache连接超时/死锁的情况。 重新启动Apache将解决这个问题,在几个小时之间的某个地方。

在服务器上运行的PHP应用程序除了在同一台计算机的单独端口上运行后端REST API之外,还包括一个面向用户的Web UI。 Web UI应用程序通过http://127.0.0.1:8888/在大多数页面加载上向后端API发送1到3个连续的REST请求。 这些超时错误首先在PHP的错误日志中被注意到,因为30秒之后REST客户端的超时错误。 匹配超时也可以在Apache的错误日志中看到。 它们以突发forms出现,所有请求都将超时,然后一批请求将迅速而成功地处理(看起来),然后是另一批超时。 在几分钟的时间窗口内,8-25秒将只是超时,随后可能是一秒钟的请求正在被处理,接着超过8-25秒的超时。 如果“请求突发”之前的“超时窗口”较短,则后面的一个会更长。 当这个问题出现的时候,任何应用程序中的curl http://localhost/server-status或者curl请求都会返回到一个PHP脚本,或者超时或者失败,或者有时会返回25到60秒。 在此期间,CPU可用,RAM可用,IO显示正常(虽然这是我不擅长监视的一件事),并且SYN_RECV状态下的套接字数量远低于系统的somaxconn和Apache的ListenBacklog (但是我确实看到在积压中等待的sockets)。

在debugging和研究过程中,我注意到很多事情是没有意义的。 首先,在我上面的curl http://localhost/server-status示例中, curl成功连接和发送数据,但从来没有得到响应。 即使经过了45分钟以上,Apache似乎也没有重置连接或任何东西, curl只是挂起和等待。 例如:

 ~# curl -vv http://localhost/server-status * About to connect() to localhost port 80 (#0) * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 80 (#0) > GET /server-status HTTP/1.1 > User-Agent: curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6 > Host: localhost > Accept: */* > 

如果我重新启动Apache,事情将恢复正常。

机器共有4GB; MemCache使用768M),4个CPU核心,每个Apache进程的常驻内存大小在20MB到30MB之间。 MaxClients已经被设置为100 ,通过我的计算和观察,它在资源约束下运行。 我的第一个难题是再现这个问题,因为我只看到它在6周内实际上显示了9次。 在预感上,我将MaxClients降低到10 (排除资源限制),看看我是否能够触发问题,并成功地重现 – 超时的“突发”模式立即开始发生。 TCP backlog占用了大约10%的容量,并且试图按上述方式发送请求永远不会返回(尽pipe请求突发通过并完成;但是我的curl请求似乎并没有突然发生)。 Timeout设置为500KeepAliveTimeout设置为5sys.fs.file-max / sys.fs.file-nr显示<1%的利用率。 nofile ulimit远高于每个进程使用的FD。 感觉停滞不前,缺乏可见性,我开始strace我的Apache进程。 经过仔细的分析,没有什么是真正脱颖而出,没有错误,没有泄漏的FD从我能说的。 然而,在“暂停期”,我确实看到了很多:

 [17284] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) 

在cadeit中可能会有所帮助,下面是在所有正在运行的进程中精炼strace输出的几个“周期”:

 ps auxw | grep sbin/apache | awk '{print"-p " $2}' | xargs strace -e poll,select,connect,recvfrom,sendto,flock 

产生:

 [17293] flock(46, LOCK_EX <unfinished ...> [17292] flock(46, LOCK_EX <unfinished ...> [17291] flock(46, LOCK_EX <unfinished ...> [17290] flock(46, LOCK_EX <unfinished ...> [17288] flock(46, LOCK_EX <unfinished ...> [17285] flock(46, LOCK_EX <unfinished ...> [17283] flock(46, LOCK_EX <unfinished ...> [17276] select(0, NULL, NULL, NULL, {0, 618967} <unfinished ...> [17287] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17287] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 1000 <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17289] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17289] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 1000 <unfinished ...> [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 964 <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 1000 <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17289] --- SIGCHLD (Child exited) @ 0 (0) --- [17289] sendto(47, "\24\0...", 20, ...) = 20 [17289] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17289] poll([{fd=47, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=47, revents=POLLOUT}]) [17289] select(48, [47], [47], NULL, {1, 0}) = 1 (out [47], left {0, 999997}) [17289] sendto(47, "<MC>"..., 35, MSG_NOSIGNAL, NULL, 0) = 35 [17289] select(48, [47], [], NULL, {1, 0}) = 1 (in [47], left {0, 999997}) [17289] recvfrom(47, "<MC>"..., 32768, 0, NULL, NULL) = 7280 [17289] connect(47, ...(8888)...("127.0.0.1")...) = -1 EINPROGRESS (...) [17289] poll([{fd=47, events=POLLOUT}], 1, 1000) = 1 ([{fd=47, revents=POLLOUT}]) [17289] sendto(47, "GET /<URL>"..., 390, MSG_NOSIGNAL, NULL, 0) = 390 [17289] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17289] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17289] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 1000 <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, [...] <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, [...] (Timeout) [17287] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 963 <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, [...] (Timeout) [17289] poll([{fd=47, [...] <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, [...] (Timeout) [17284] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 1000 <unfinished ...> [17287] <... poll resumed> ) = 0 (Timeout) [17287] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17287] --- SIGCHLD (Child exited) @ 0 (0) --- [17292] <... flock resumed> ) = 0 [17292] sendto(47, "\24\0...", 20, ...) = 20 [17292] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] poll([{fd=47, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=47, revents=POLLOUT}]) [17292] select(48, [47], [47], NULL, {1, 0}) = 1 (out [47], left {0, 999997}) [17292] sendto(47, "<MC>"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 [17292] select(48, [47], [], NULL, {1, 0}) = 1 (in [47], left {0, 999997}) [17292] recvfrom(47, "<MC>"..., 32768, 0, NULL, NULL) = 22479 [17285] <... flock resumed> ) = 0 [17285] sendto(47, "\24\0...", 20, ...) = 20 [17285] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17285] poll([{fd=47, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=47, revents=POLLOUT}]) [17285] select(48, [47], [47], NULL, {1, 0}) = 1 (out [47], left {0, 999997}) [17285] sendto(47, "<MC>"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 [17285] select(48, [47], [], NULL, {1, 0}) = 1 (in [47], left {0, 999997}) [17285] recvfrom(47, "<MC>"..., 32768, 0, NULL, NULL) = 22479 [17285] sendto(47, "\24\0...", 20, ...) = 20 [17285] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17285] poll([{fd=47, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=47, revents=POLLOUT}]) [17285] select(48, [47], [47], NULL, {1, 0}) = 1 (out [47], left {0, 999997}) [17285] sendto(47, "<MC>"..., 35, MSG_NOSIGNAL, NULL, 0) = 35 [17285] select(48, [47], [], NULL, {1, 0}) = 1 (in [47], left {0, 999997}) [17285] recvfrom(47, "<MC>"..., 32768, 0, NULL, NULL) = 7280 [17287] poll([{fd=44, events=POLLIN}], 1, 5000 <unfinished ...> [17292] poll([{fd=44, events=POLLIN}], 1, 5000 <unfinished ...> [17287] <... poll resumed> ) = 1 ([{fd=44, revents=POLLIN}]) [17285] connect(47, ...(8888)...("127.0.0.1")...) = -1 EINPROGRESS (...) [17285] poll([{fd=47, events=POLLOUT}], 1, 1000) = 1 ([{fd=47, revents=POLLOUT}]) [17285] sendto(47, "GET /<URL>"..., 380, MSG_NOSIGNAL, NULL, 0) = 380 [17285] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17285] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17285] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 1000 <unfinished ...> [17292] <... poll resumed> ) = 1 ([{fd=44, revents=POLLIN}]) [17292] sendto(46, "\24\0...", 20, ...) = 20 [17292] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] poll([{fd=46, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=46, revents=POLLOUT}]) [17292] select(47, [46], [46], NULL, {1, 0}) = 1 (out [46], left {0, 999996}) [17292] sendto(46, "<MC>"..., 35, MSG_NOSIGNAL, NULL, 0) = 35 [17292] select(47, [46], [], NULL, {1, 0}) = 1 (in [46], left {0, 999997}) [17292] recvfrom(46, "<MC>"..., 32768, 0, NULL, NULL) = 7280 [17292] sendto(46, "\24\0...", 20, ...) = 20 [17292] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] poll([{fd=46, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=46, revents=POLLOUT}]) [17292] select(47, [46], [46], NULL, {1, 0}) = 1 (out [46], left {0, 999997}) [17292] sendto(46, "<MC>"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 [17292] select(47, [46], [], NULL, {1, 0}) = 1 (in [46], left {0, 999997}) [17292] recvfrom(46, "<MC>"..., 32768, 0, NULL, NULL) = 22479 [17287] sendto(46, "\24\0...", 20, ...) = 20 [17287] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17287] poll([{fd=46, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=46, revents=POLLOUT}]) [17292] sendto(46, "\24\0...", 20, ...) = 20 [17287] select(47, [46], [46], NULL, {1, 0}) = 1 (out [46], left {0, 999997}) [17287] sendto(46, "<MC>"..., 35, MSG_NOSIGNAL, NULL, 0) = 35 [17287] select(47, [46], [], NULL, {1, 0}) = 1 (in [46], left {0, 999997}) [17287] recvfrom(46, "<MC>"..., 32768, 0, NULL, NULL) = 7280 [17292] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] poll([{fd=46, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=46, revents=POLLOUT}]) [17292] select(47, [46], [46], NULL, {1, 0}) = 1 (out [46], left {0, 999997}) [17287] sendto(46, "\24\0...", 20, ... <unfinished ...> [17292] sendto(46, "<MC>"..., 35, MSG_NOSIGNAL, NULL, 0 <unfinished ...> [17287] <... sendto resumed> ) = 20 [17292] <... sendto resumed> ) = 35 [17292] select(47, [46], [], NULL, {1, 0}) = 1 (in [46], left {0, 999997}) [17292] recvfrom(46, "<MC>"..., 32768, 0, NULL, NULL) = 7280 [17287] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17287] poll([{fd=46, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=46, revents=POLLOUT}]) [17287] select(47, [46], [46], NULL, {1, 0}) = 1 (out [46], left {0, 999997}) [17287] sendto(46, "<MC>"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 [17287] select(47, [46], [], NULL, {1, 0}) = 1 (in [46], left {0, 999996}) [17287] recvfrom(46, "<MC>"..., 32768, 0, NULL, NULL) = 22479 [17292] sendto(46, "\24\0...", 20, ...) = 20 [17292] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] poll([{fd=46, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=46, revents=POLLOUT}]) [17292] select(47, [46], [46], NULL, {1, 0}) = 1 (out [46], left {0, 999997}) [17292] sendto(46, "<MC>"..., 35, MSG_NOSIGNAL, NULL, 0) = 35 [17292] select(47, [46], [], NULL, {1, 0}) = 1 (in [46], left {0, 999997}) [17292] recvfrom(46, "<MC>"..., 32768, 0, NULL, NULL) = 7280 [17292] flock(46, LOCK_EX <unfinished ...> [17287] sendto(46, "\24\0...", 20, ...) = 20 [17287] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17287] poll([{fd=46, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=46, revents=POLLOUT}]) [17287] select(47, [46], [46], NULL, {1, 0}) = 1 (out [46], left {0, 999997}) [17287] sendto(46, "<MC>"..., 35, MSG_NOSIGNAL, NULL, 0) = 35 [17287] select(47, [46], [], NULL, {1, 0}) = 1 (in [46], left {0, 999997}) [17287] recvfrom(46, "<MC>"..., 32768, 0, NULL, NULL) = 7280 [17287] sendto(46, "\24\0...", 20, ...) = 20 [17287] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17287] poll([{fd=46, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=46, revents=POLLOUT}]) [17287] select(47, [46], [46], NULL, {1, 0}) = 1 (out [46], left {0, 999997}) [17287] sendto(46, "<MC>"..., 35, MSG_NOSIGNAL, NULL, 0) = 35 [17287] select(47, [46], [], NULL, {1, 0}) = 1 (in [46], left {0, 999997}) [17287] recvfrom(46, "<MC>"..., 32768, 0, NULL, NULL) = 7280 [17287] flock(46, LOCK_EX <unfinished ...> [17289] <... poll resumed> ) = 0 (Timeout) [17289] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17289] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 1000 <unfinished ...> [17276] <... select resumed> ) = 0 (Timeout) [17276] select(0, NULL, NULL, NULL, {1, 0} <unfinished ...> [17284] <... poll resumed> ) = 0 (Timeout) [17284] poll([{fd=47, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [17284] poll([{fd=47, [...] <unfinished ...> ... 

不知道poll()超时与它有什么关系。

为了进一步产生stream量模式,并指出与SIGCHLD通知的一个奇怪的关联,这里只有connect()recvfrom() 。 在5秒内没有输出的时候,我打了几次input(虽然大部分是连续爆发的(虽然有时在GET /...和其他之间有几秒))。

 [17289] sendto(47, "GET /[BACKEND CALL]"..., 390, MSG_NOSIGNAL, NULL, 0) = 390 [17283] --- SIGCHLD (Child exited) @ 0 (0) --- [17292] sendto(47, "\24\0...", 20, ...) = 20 [17292] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] sendto(47, "[MC]"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 [17292] recvfrom(47, "[MC]"..., 32768, 0, NULL, NULL) = 22479 [17288] sendto(47, "\24\0...", 20, ...) = 20 [17288] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17288] sendto(47, "[MC]"..., 37, ...) = 37 [17288] recvfrom(47, "[MC]"..., ...) = 22479 [17288] sendto(47, "\24\0...", 20, ...) = 20 [17288] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17288] sendto(47, "[MC]"..., 35, ...) = 35 [17288] recvfrom(47, "[MC]"..., ...) = 7280 [17288] sendto(47, "\24\0...", 20, ...) = 20 [17288] connect(47, ...(8888)...("127.0.0.1")...) = -1 EINPROGRESS (...) [17288] sendto(47, "GET /[BACKEND CALL]"..., 380, ...) = 380 [17292] sendto(46, "\24\0...", 20, ...) = 20 [17292] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] sendto(46, "[MC]"..., 35, ...) = 35 [17292] recvfrom(46, "[MC]"..., ...) = 7280 [17292] sendto(46, "\24\0...", 20, ...) = 20 [17292] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] sendto(46, "[MC]"..., 37, ...) = 37 [17292] recvfrom(46, "[MC]"..., ...) = 22479 [17283] sendto(46, "\24\0...", 20, ...) = 20 [17283] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] sendto(46, "\24\0...", 20, ...) = 20 [17283] sendto(46, "[MC]"..., 35, ...) = 35 [17283] recvfrom(46, "[MC]"..., ...) = 7280 [17292] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] sendto(46, "[MC]"..., 35, ...) = 35 [17292] recvfrom(46, "[MC]"..., ...) = 7280 [17283] sendto(46, "\24\0...", 20, ...) = 20 [17283] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17283] sendto(46, "[MC]"..., 37, ...) = 37 [17283] recvfrom(46, "[MC]"..., ...) = 22479 [17292] sendto(46, "\24\0...", 20, ...) = 20 [17292] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17292] sendto(46, "[MC]"..., 35, ...) = 35 [17292] recvfrom(46, "[MC]"..., ...) = 7280 [17283] sendto(46, "\24\0...", 20, ...) = 20 [17283] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17283] sendto(46, "[MC]"..., 35, ...) = 35 [17283] recvfrom(46, "[MC]"..., ...) = 7280 [17283] sendto(46, "\24\0...", 20, ...) = 20 [17283] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17283] sendto(46, "[MC]"..., 35, ...) = 35 [17283] recvfrom(46, "[MC]"..., ...) = 7280 <Long Wait> [17284] --- SIGCHLD (Child exited) @ 0 (0) --- [17284] sendto(46, "\24\0...", 20, ...) = 20 [17284] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17284] sendto(46, "[MC]"..., 35, ...) = 35 [17284] recvfrom(46, "[MC]"..., ...) = 7280 [17284] sendto(46, "\24\0...", 20, ...) = 20 [17284] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17284] sendto(46, "[MC]"..., 37, ...) = 37 [17284] recvfrom(46, "[MC]"..., ...) = 22479 [17284] sendto(46, "\24\0...", 20, ...) = 20 [17284] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17284] sendto(46, "[MC]"..., 35, ...) = 35 [17284] recvfrom(46, "[MC]"..., ...) = 7280 [17284] sendto(46, "\24\0...", 20, ...) = 20 [17284] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17284] sendto(46, "[MC]"..., 35, ...) = 35 [17284] recvfrom(46, "[MC]"..., ...) = 7280 [17284] sendto(47, "\24\0...", 20, ...) = 20 [17284] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17284] sendto(47, "[MC]"..., 37, ...) = 37 [17284] recvfrom(47, "[MC]"..., ...) = 22479 [17284] connect(47, ...(8888)...("127.0.0.1")...) = -1 EINPROGRESS (...) [17284] sendto(47, "POST /[BACKEND CALL]"..., 412, ...) = 412 <Long Wait> [17289] --- SIGCHLD (Child exited) @ 0 (0) --- [17293] sendto(47, "\24\0...", 20, ...) = 20 [17293] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17293] sendto(47, "[MC]"..., 37, ...) = 37 [17293] recvfrom(47, "[MC]"..., ...) = 22479 [17290] sendto(47, "\24\0...", 20, ...) = 20 [17290] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17290] sendto(47, "[MC]"..., 37, ...) = 37 [17290] recvfrom(47, "[MC]"..., ...) = 22479 [17290] sendto(47, "\24\0...", 20, ...) = 20 [17290] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17290] sendto(47, "[MC]"..., 35, ...) = 35 [17290] recvfrom(47, "[MC]"..., ...) = 7280 [17290] connect(47, ...(8888)...("127.0.0.1")...) = -1 EINPROGRESS (...) [17290] sendto(47, "POST /[BACKEND CALL]"..., 427, ...) = 427 [17289] sendto(46, "\24\0...", 20, ...) = 20 [17289] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17289] sendto(46, "[MC]"..., 35, ...) = 35 [17289] recvfrom(46, "[MC]"..., ...) = 7280 [17289] sendto(46, "\24\0...", 20, ...) = 20 [17289] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17289] sendto(46, "[MC]"..., 37, ...) = 37 [17289] recvfrom(46, "[MC]"..., ...) = 22479 [17289] sendto(46, "\24\0...", 20, ...) = 20 [17289] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17289] sendto(46, "[MC]"..., 35, ...) = 35 [17289] recvfrom(46, "[MC]"..., ...) = 7280 [17289] sendto(46, "\24\0...", 20, ...) = 20 [17289] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17289] sendto(46, "[MC]"..., 35, ...) = 35 [17289] recvfrom(46, "[MC]"..., ...) = 7280 [17293] sendto(46, "\24\0...", 20, ...) = 20 [17293] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17293] sendto(46, "[MC]"..., 35, ...) = 35 [17293] recvfrom(46, "[MC]"..., ...) = 7280 [17293] sendto(46, "\24\0...", 20, ...) = 20 [17293] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17293] sendto(46, "[MC]"..., 37, ...) = 37 [17293] recvfrom(46, "[MC]"..., ...) = 22479 [17293] sendto(46, "\24\0...", 20, ...) = 20 [17293] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17293] sendto(46, "[MC]"..., 35, ...) = 35 [17293] recvfrom(46, "[MC]"..., ...) = 7280 [17293] sendto(46, "\24\0...", 20, ...) = 20 [17293] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17293] sendto(46, "[MC]"..., 35, ...) = 35 [17293] recvfrom(46, "[MC]"..., ...) = 7280 <Long Wait> [17288] --- SIGCHLD (Child exited) @ 0 (0) --- [17291] sendto(47, "\24\0...", 20, ...) = 20 [17291] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17291] sendto(47, "[MC]"..., 37, ...) = 37 [17291] recvfrom(47, "[MC]"..., ...) = 22479 [17291] sendto(47, "\24\0...", 20, ...) = 20 [17291] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17291] sendto(47, "[MC]"..., 35, ...) = 35 [17291] recvfrom(47, "[MC]"..., ...) = 7280 [17291] sendto(47, "\24\0...", 20, ...) = 20 [17291] connect(47, ...(8888)...("127.0.0.1")...) = -1 EINPROGRESS (...) [17291] sendto(47, "GET /[BACKEND CALL]"..., 380, ...) = 380 [17288] sendto(46, "\24\0...", 20, ...) = 20 [17288] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17288] sendto(46, "[MC]"..., 35, ...) = 35 [17288] recvfrom(46, "[MC]"..., ...) = 7280 [17288] sendto(46, "\24\0...", 20, ...) = 20 [17288] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17288] sendto(46, "[MC]"..., 37, ...) = 37 [17288] recvfrom(46, "[MC]"..., ...) = 22479 [17288] sendto(46, "\24\0...", 20, ...) = 20 [17288] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17288] sendto(46, "[MC]"..., 35, ...) = 35 [17288] recvfrom(46, "[MC]"..., ...) = 7280 [17288] sendto(46, "\24\0...", 20, ...) = 20 [17288] connect(46, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17288] sendto(46, "[MC]"..., 35, ...) = 35 [17288] recvfrom(46, "[MC]"..., ...) = 7280 <Long Wait> [17284] --- SIGCHLD (Child exited) @ 0 (0) --- [17284] sendto(47, "\24\0...", 20, ...) = 20 [17284] connect(47, ...11211...127.0.0.1 = -1 EINPROGRESS (...) [17284] sendto(47, "[MC]"..., 35, ...) = 35 [17284] recvfrom(47, "[MC]"..., ...) = 7280 [17284] connect(47, ...(8888)...("127.0.0.1")...) = -1 EINPROGRESS (...) [17284] sendto(47, "GET /[BACKEND CALL]"..., 347, ...) = 347 <Long Wait> ... 

请注意,这些请求中没有一个是“Web UI请求”。 有些东西直接后端,所以这不一定意味着前端请求永远不会通过。 我发现这个post听起来非常类似于我遇到的问题,但解决scheme(不要求在请求内进行HTTP调用)并不适用。 当长时间观察上面的输出时,我只会看到每100个“后端”请求中有一个“Web UI”请求,所以我不会觉得这是一个等待后端调用的死锁要返回,但只有10个MaxClients ,可能至less有一个等待(不确定;当出现这个问题时不能看到/server-status )。

我很困扰在这一点上; 从未见过这样的事情。 上面可能没有足够的信息来正确回答这个问题,所以如果任何人都可以想到任何其他的输出或testing来帮助debugging,请告诉我,我会编辑。 任何指导将非常感激。