来自httpd进程的高CPU

我目前正在运行一个服务器，只是运行几个非常低的stream量站点的CPU高。其中一个网站还在开发中，不久即将上线。然而，这个网站是非常非常缓慢的…当浏览其页面，我可以看到，httpd的CPU从30％到100％（见上面的输出下面）。

我已经调整了httpd＆MySQL，Apache Solr，Tomcat的高性能，而且我正在使用APC。

不知道该怎么做，或者如何find罪魁祸首，因为我有一堆消息在httpd日志，并已追逐死胡同一段时间…任何帮助，非常感谢。

服务器： Authentic AMD，四核AMD Opteron（tm）处理器2352，RAM 16GB

Linux 2.6.27 64位，Centos 5.5

Plesk 9.5.4，MySQL 5.1.48，PHP 5.2.17

Apache / 2.2.3（CentOS）DAV / 2 mod_jk / 1.2.15 mod_ssl / 2.2.3 OpenSSL / 0.9.8e-fips-rhel5 PHP / 5.2.17 mod_perl / 2.0.4 Perl / v5.8.8

Tomcat6-6.0.29-1.jpp5，Tomcat-native-1.1.20-1.el5，Apache Solr

最佳

17595 apache 20 0 1825m 507m 10m R 100.4 3.2 0:17.50 httpd 17596 apache 20 0 1565m 247m 9936 R 83.1 1.5 0:10.86 httpd 17598 apache 20 0 1430m 110m 6472 S 54.5 0.7 0:08.66 httpd 17599 apache 20 0 1438m 124m 12m S 37.2 0.8 0:11.20 httpd 16197 mysql 20 0 13.0g 2.0g 5440 S 9.6 12.6 297:12.79 mysqld 17617 root 20 0 12748 1172 812 R 0.7 0.0 0:00.88 top 8169 tomcat 20 0 4613m 268m 6056 S 0.3 1.7 6:40.56 java

httpd error_log

 [debug] prefork.c(991): AcceptMutex: sysvsem (default: sysvsem) [info] mod_fcgid: Process manager 17593 started [debug] proxy_util.c(1854): proxy: grabbed scoreboard slot 0 in child 17594 for worker proxy:reverse [debug] proxy_util.c(1967): proxy: initialized single connection worker 0 in child 17594 for (*) [debug] proxy_util.c(1854): proxy: grabbed scoreboard slot 0 in child 17595 for worker proxy:reverse [debug] proxy_util.c(1873): proxy: worker proxy:reverse already initialized [notice] child pid 22782 exit signal Segmentation fault (11) [error] (43)Identifier removed: apr_global_mutex_lock(jk_log_lock) failed [debug] util_ldap.c(2021): LDAP merging Shared Cache conf: shm=0x7fd29a5478c0 rmm=0x7fd29a547918 for VHOST: example.com [info] APR LDAP: Built with OpenLDAP LDAP SDK [info] LDAP: SSL support available [info] Init: Seeding PRNG with 256 bytes of entropy [info] Init: Generating temporary RSA private keys (512/1024 bits) [info] Init: Generating temporary DH parameters (512/1024 bits) [debug] ssl_scache_shmcb.c(374): shmcb_init allocated 512000 bytes of shared memory [debug] ssl_scache_shmcb.c(554): entered shmcb_init_memory() [debug] ssl_scache_shmcb.c(576): for 512000 bytes, recommending 4265 indexes [debug] ssl_scache_shmcb.c(619): shmcb_init_memory choices follow [debug] ssl_scache_shmcb.c(621): division_mask = 0x1F [debug] ssl_scache_shmcb.c(623): division_offset = 96 [debug] ssl_scache_shmcb.c(625): division_size = 15997 [debug] ssl_scache_shmcb.c(627): queue_size = 2136 [debug] ssl_scache_shmcb.c(629): index_num = 133 [debug] ssl_scache_shmcb.c(631): index_offset = 8 [debug] ssl_scache_shmcb.c(633): index_size = 16 [debug] ssl_scache_shmcb.c(635): cache_data_offset = 8 [debug] ssl_scache_shmcb.c(637): cache_data_size = 13853 [debug] ssl_scache_shmcb.c(650): leaving shmcb_init_memory()

尝试将％P（和％D）添加到日志文件中 – 那么您应该能够将您在“顶部”看到的内容与您的日志文件关联起来。

[通知]孩子pid 22782退出信号分割故障（11）

这里肯定是错误的，你应该在/etc/init.d/httpd的开头添加ulimit -c unlimited以便在下次使用segfault失败的时候获得一个coredump。可能mod_jk是问题的根源，因为日志中存在与mod_jk相关的错误。

我在列表中看到mod_perl。这个网站是用PERL写的吗？如果是这样，那么写得不好的PERL代码将成为问题的根源。

相同的评论适用于PHP。 PHP应用程序的性能不为人知，CMS应用程序因资源耗尽而闻名。如果您是托pipe服务提供商，最好禁用此CMS套餐或收取更高的费用来支付额外资源。

但是，如果你正在运行这个CMS供你自己使用，因为它是开源的，你应该在StackOverflow上发布另一个问题，命名包，并询问如何追踪和修复写得不好的代码。

我还没有看到分段错误错误，但我仍然看到来自httpd的高CPU。我能够在CPU上运行httpd进程的一个strace，我有以下几点：

  # strace -c -p 28964 Process 28964 attached - interrupt to quit ^CProcess 28964 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 88.94 0.006093 0 98299 4562 lstat 3.01 0.000206 0 2740 getcwd 2.28 0.000156 0 2158 2 read 2.26 0.000155 0 541 37 open 1.68 0.000115 0 1321 1321 readlink 1.52 0.000104 0 1678 822 access 0.32 0.000022 0 502 fstat 0.00 0.000000 0 25 write 0.00 0.000000 0 507 close 0.00 0.000000 0 547 478 stat 0.00 0.000000 0 23 poll 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 2 rt_sigprocmask 0.00 0.000000 0 2 writev 0.00 0.000000 0 3 setitimer 0.00 0.000000 0 1 sendfile ... ------ ----------- ----------- --------- --------- ---------------- 100.00 0.006851 108381 7224 total

来自lstat的4562错误是相同types的错误，并在日志文件中显示如下：

 # strace -f -t -o /var/log/strace.output -p 28964

strace.output

 28964 07:10:38 lstat("/var", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www", {st_mode=S_IFDIR|0755, st_size=94, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites", {st_mode=S_IFDIR|0755, st_size=30, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all", {st_mode=S_IFDIR|0755, st_size=66, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules/views", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules/views/includes", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules/views/includes/sites", 0x7fff1e627370) = -1 ENOENT (No such file or directory)

上面列出的文件夹都在这个网站目录中，并且是Drupal CMS的一部分。但最后一个列出

 /var/www/vhosts/example.com/httpdocs/sites/all/modules/views/includes/sites

不存在，它应该是真的

 /var/www/vhosts/example.com/httpdocs/sites

这确实存在。看起来像lstat正试图读取一个不存在的目录….？

 -1 ENOENT (No such file or directory)

什么是解决这个问题的最佳方法，并find这个错误的来源，丢失的目录？