我一整天都在处理这个问题,这让我疯狂。 所有Googlesearch结果和search都会导致死胡同。 我希望有人能和我一起为自己和未来的受害者提供解决scheme。 开始了。
我正在运行一个非常受欢迎的网站,每天有超过3M的页面浏览量。 平均而言,每秒钟的页面浏览量为34页,但在高峰时段更为实际,每秒页面浏览量超过300页。 把这些当成是要求。
我正在运行带有2个E5620 CPU,12GB RAM和一个Micron P300 6Gb / s SSD的Ubuntu 10.04 64位服务器。 在高峰时段,CPU和内存负载是平均的(20-30%的CPU和一半的内存使用)。
这个站点的软件是:NGINX,MySQL,PHP5-FPM,PHP-APC和Memcached。 好吧,现在最后的文章的肉,这里是我的错误日志。 有一堆这些错误logging。
在/ var /日志/ PHP5-FPM
Jul 20 14:49:47.289895 [NOTICE] fpm is running, pid 29373 Jul 20 14:49:47.337092 [NOTICE] ready to handle connections Jul 20 14:51:23.957504 [ERROR] [pool www] unable to retrieve process activity of one or more child(ren). Will try again later. Jul 20 14:51:41.846439 [WARNING] [pool www] child 29534 exited with code 1 after 114.518174 seconds from start Jul 20 14:51:41.846797 [NOTICE] [pool www] child 29597 started Jul 20 14:51:41.896653 [WARNING] [pool www] child 29408 exited on signal 11 SIGSEGV after 114.596706 seconds from start Jul 20 14:51:41.897178 [NOTICE] [pool www] child 29598 started Jul 20 14:51:41.903286 [WARNING] [pool www] child 29398 exited with code 1 after 114.605761 seconds from start Jul 20 14:51:41.903719 [NOTICE] [pool www] child 29600 started Jul 20 14:51:41.907816 [WARNING] [pool www] child 29437 exited with code 1 after 114.601417 seconds from start Jul 20 14:51:41.908253 [NOTICE] [pool www] child 29601 started Jul 20 14:51:41.916002 [WARNING] [pool www] child 29513 exited with code 1 after 114.592514 seconds from start Jul 20 14:51:41.916501 [NOTICE] [pool www] child 29602 started Jul 20 14:51:41.916558 [WARNING] [pool www] child 29494 exited on signal 11 SIGSEGV after 114.597355 seconds from start Jul 20 14:51:41.916873 [NOTICE] [pool www] child 29603 started Jul 20 14:51:41.921389 [WARNING] [pool www] child 29502 exited with code 1 after 114.600405 seconds from start
/var/log/nginx/error.log
2011/07/20 15:48:42 [error] 29583#0: *569743 readv() failed (104: Connection reset by peer) while reading upstream, client: 77.223.197.193, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com" 2011/07/20 15:48:42 [error] 29578#0: *571695 readv() failed (104: Connection reset by peer) while reading upstream, client: 150.70.64.196, server: domain.com, request: "GET /page HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com" 2011/07/20 15:48:42 [error] 29581#0: *571050 readv() failed (104: Connection reset by peer) while reading upstream, client: 110.136.157.66, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com" 2011/07/20 15:48:42 [error] 29581#0: *564892 readv() failed (104: Connection reset by peer) while reading upstream, client: 110.136.161.214, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com" 2011/07/20 15:48:42 [error] 29585#0: *456171 readv() failed (104: Connection reset by peer) while reading upstream, client: 93.223.33.135, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com" 2011/07/20 15:48:42 [error] 29585#0: *471192 readv() failed (104: Connection reset by peer) while reading upstream, client: 74.90.33.142, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com" 2011/07/20 15:48:42 [error] 29580#0: *570132 readv() failed (104: Connection reset by peer) while reading upstream, client: 180.246.182.191, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
最后,我想指出,我确实尝试禁用PHP-APC来查看它是否是可选cacher的错误,但是segfaults仍然存在。 我也有PHP5-SUHOSIN安装,我也禁用它,但错误仍然在继续发生。
我将不胜感激任何帮助。 谢谢。
安装PHP和所有PHP模块的debugging符号(如果Ubuntu提供它们,否则需要在启用debugging的情况下重新构build),然后根据我在几个小时前回答的问题启用核心转储。 然后启动GDB并去镇上。