Apache经常与semop挂起

我的Apache经常挂起多个线程。 每个过程都会花费几个小时。 Backtrace看起来像这样:

(gdb) backtrace #0 0x00002af60c22b2d7 in semop () from /lib64/libc.so.6 #1 0x00002af60bbf612c in ?? () from /usr/lib64/libapr-1.so.0 #2 0x000055555559e614 in ?? () from /usr/sbin/httpd2-prefork #3 0x000055555559e9ea in ?? () from /usr/sbin/httpd2-prefork #4 0x000055555559f25d in ap_mpm_run () from /usr/sbin/httpd2-prefork #5 0x000055555557a080 in main () from /usr/sbin/httpd2-prefork 

我看到他们正在等待连接所有Apache进程的pipe道。

 strace -p 3069 .... read(7, 0x7fff16a04df7, 1) = -1 EAGAIN (Resource temporarily unavailable) semop(286162952, 0x2af60bd07dc0, 1 <unfinished ...> 

Apache在这里做什么?

我怎么知道是什么原因造成的?

更新

数据在评论中请求

 # ipcs -a ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x06347849 32768 root 666 65544 2 0x0c6629c9 21004289 root 640 1166952 2 0x3107040d 98306 root 666 131176 3 0x00000000 436994051 root 600 33554432 11 dest 0x01070756 191135748 root 664 4192 1 0x01070730 190349317 root 664 4192 1 0x01070736 190382086 root 664 4192 1 0x01070742 190414855 root 664 4192 1 0x01070746 190447624 root 664 4192 1 0x01070753 190545929 root 664 4192 1 0x0107075e 190611466 root 664 4192 1 0x01070750 191037451 root 664 4192 1 0x010706c8 21069838 root 664 4192 1 0x0107074d 191070223 root 664 4192 1 ------ Semaphore Arrays -------- key semid owner perms nsems 0x0107000d 0 root 666 1 0x0107000e 32769 root 666 1 0x3107040d 98306 root 666 5 0x72070097 243433475 root 666 2 0x00000000 977469444 wwwrun 600 1 0x4d028007 262149 root 600 8 0x00000000 450166790 wwwrun 600 1 0x0107073f 1209401351 root 664 1 0x00000000 977502216 wwwrun 600 1 0x00000000 1208451083 root 600 1 0x01070751 1208582156 root 664 1 0x01070758 1208647693 root 664 1 0x00000000 1208680462 root 600 1 0x01070749 1209237519 root 664 1 0x0107074e 1209270289 root 664 1 0x00000000 1209303058 root 600 1 0x00000000 1209335827 root 600 1 0x00000000 1209434132 root 600 1 ------ Message Queues -------- key msqid owner perms used-bytes messages 

 # ps auxwww | grep "apache" wwwrun 2708 0.0 0.5 201576 11972 ? S Nov11 0:05 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL wwwrun 3607 0.0 0.6 202472 13388 ? S Nov11 0:06 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL root 5798 0.0 0.7 200828 14800 ? Ss Nov08 0:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL wwwrun 12926 0.0 0.5 201712 11768 ? S 08:19 0:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL wwwrun 13009 0.0 0.6 202196 13340 ? S 02:19 0:05 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf -DSSL 

还有几个过程,但你得到的形象。

另外它是一个Suse服务器:

 # cat /proc/version Linux version 2.6.16.60-0.74.7-default (geeko@buildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1 Fri Nov 26 09:16:10 UTC 2010 

httpd.conf文件

 # grep ^[^#] /etc/apache2/httpd.conf Include /etc/apache2/uid.conf Include /etc/apache2/server-tuning.conf ErrorLog /var/log/apache2/error_log Include /etc/apache2/sysconfig.d/loadmodule.conf Include /etc/apache2/listen.conf Include /etc/apache2/mod_log_config.conf Include /etc/apache2/sysconfig.d/global.conf Include /etc/apache2/mod_status.conf Include /etc/apache2/mod_info.conf Include /etc/apache2/mod_usertrack.conf Include /etc/apache2/mod_autoindex-defaults.conf TypesConfig /etc/apache2/mime.types DefaultType text/plain Include /etc/apache2/mod_mime-defaults.conf Include /etc/apache2/errors.conf Include /etc/apache2/ssl-global.conf <Directory /> Options None AllowOverride None Order deny,allow Deny from all </Directory> AccessFileName .htaccess <Files ~ "^\.ht"> Order allow,deny Deny from all </Files> DirectoryIndex index.html index.html.var Include /etc/apache2/default-server.conf Include /etc/apache2/sysconfig.d/include.conf Include /etc/apache2/vhosts.d/*.conf 

read(7 ,..)指向一个pipe道:

 # ls -la /proc/3069/fd/7 lr-x------ 1 root root 64 Nov 7 17:24 7 -> pipe:[157329520] 

它连接所有的Apache进程:

 # lsof | grep 157329520 httpd2-pr 2430 root 7r FIFO 0,5 157329520 pipe httpd2-pr 2430 root 8w FIFO 0,5 157329520 pipe httpd2-pr 3061 wwwrun 7r FIFO 0,5 157329520 pipe httpd2-pr 3061 wwwrun 8w FIFO 0,5 157329520 pipe ... 

关于信号量

 # ipcs -s -i 39452680 Semaphore Array semid=39452680 uid=30 gid=8 cuid=0 cgid=0 mode=0600, access_perms=0600 nsems = 1 otime = Mon Nov 19 09:47:05 2012 ctime = Sun Nov 18 11:15:04 2012 semnum value ncount zcount pid 0 0 5 0 14678 

ncount总是匹配从apache2ctl status闲置的工人数量,所以我相信整个semop只是正常的idel工人,并没有任何关系我的问题…

我相信你正在绊倒一个鲜为人知的问题。 这似乎是Linux中的一个bug,其中semephore计数已经是0,但进程等待,如果它不是。 我不明白这个bug的机制,但它显然只发生在加载的机器上。

运行ipcs -s -i $SEM_ID其中$ SEM_ID是给semop()的第一个参数。 它应该显示计数为0,这将确认问题在Linux中,而不是Apache。 如果值不是0,那么问题将出现在Apache的代码中。

看来你在2年左右没有更新内核,从那以后可能有了一个修复。 其他人报告说,1000的epollpath限制阻止了Apache使用超过1000个“最大客户端”设置。