高服务器负载无法弄清楚为什么

我的服务器目前正在运行CentOS 5.2,与WHM 11.34。

目前,我们在6.43到12的平均负载。 我们托pipe的网站花了很多时间来回应和解决。 top不显示任何exception, iftop不显示很多stream量。

我们有很多经销商,有些不擅长编写代码,我们怎么能find罪魁祸首呢?

vmstat输出

 vmstat procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ rb swpd free buff cache si so bi bo in cs us sy id wa st 0 2 84 78684 154916 1021080 0 0 72 274 0 14 6 3 80 12 0 

顶部输出(由%CPUsorting)

 top - 21:44:43 up 5 days, 10:39, 3 users, load average: 3.36, 4.18, 4.73 Tasks: 222 total, 3 running, 219 sleeping, 0 stopped, 0 zombie Cpu(s): 5.8%us, 2.3%sy, 0.2%ni, 79.6%id, 11.8%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 2074580k total, 1863044k used, 211536k free, 174828k buffers Swap: 2040212k total, 84k used, 2040128k free, 987604k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15930 mysql 15 0 138m 46m 4380 S 4 2.3 1:45.87 mysqld 21772 igniteth 17 0 23200 7152 3932 R 4 0.3 0:00.02 php 1586 root 10 -5 0 0 0 S 2 0.0 11:45.19 kjournald 21759 root 15 0 2416 1024 732 R 2 0.0 0:00.01 top 1 root 15 0 2156 648 560 S 0 0.0 0:26.31 init 2 root RT 0 0 0 0 S 0 0.0 0:00.35 migration/0 3 root 34 19 0 0 0 S 0 0.0 0:00.32 ksoftirqd/0 4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 5 root RT 0 0 0 0 S 0 0.0 0:02.00 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:00.11 ksoftirqd/1 7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 8 root RT 0 0 0 0 S 0 0.0 0:01.29 migration/2 9 root 34 19 0 0 0 S 0 0.0 0:00.26 ksoftirqd/2 10 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/2 11 root RT 0 0 0 0 S 0 0.0 0:00.90 migration/3 12 root 34 19 0 0 0 R 0 0.0 0:00.20 ksoftirqd/3 13 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/3 

顶部输出(按CPU时间sorting)

 top - 21:46:12 up 5 days, 10:41, 3 users, load average: 2.88, 3.82, 4.55 Tasks: 217 total, 1 running, 216 sleeping, 0 stopped, 0 zombie Cpu(s): 3.7%us, 2.0%sy, 2.0%ni, 67.2%id, 25.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 2074580k total, 1959516k used, 115064k free, 183116k buffers Swap: 2040212k total, 84k used, 2040128k free, 1090308k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND 32367 root 16 0 215m 212m 1548 S 0 10.5 62:03.63 62:03 tailwatchd 1586 root 10 -5 0 0 0 S 0 0.0 11:45.27 11:45 kjournald 1576 root 10 -5 0 0 0 S 0 0.0 2:37.86 2:37 kjournald 27722 root 16 0 2556 1184 800 S 0 0.1 1:48.94 1:48 top 15930 mysql 15 0 138m 46m 4380 S 4 2.3 1:48.63 1:48 mysqld 2932 root 34 19 0 0 0 S 0 0.0 1:41.05 1:41 kipmi0 226 root 10 -5 0 0 0 S 0 0.0 1:34.33 1:34 kswapd0 2671 named 25 0 74688 7400 2116 S 0 0.4 1:23.58 1:23 named 3229 root 15 0 10300 3348 2724 S 0 0.2 0:40.85 0:40 sshd 1580 root 10 -5 0 0 0 S 0 0.0 0:30.62 0:30 kjournald 1 root 17 0 2156 648 560 S 0 0.0 0:26.32 0:26 init 2616 root 15 0 1816 576 480 S 0 0.0 0:23.50 0:23 syslogd 1584 root 10 -5 0 0 0 S 0 0.0 0:18.67 0:18 kjournald 4342 root 34 19 27692 11m 2116 S 0 0.5 0:18.23 0:18 yum-updatesd 8044 bollingp 15 0 3456 2036 740 S 1 0.1 0:15.56 0:15 imapd 26 root 10 -5 0 0 0 S 0 0.0 0:14.18 0:14 kblockd/1 7989 gmailsit 16 0 3196 1748 736 S 0 0.1 0:10.43 0:10 imapd 

iostat -xtk 1 10输出

 [root@server1 tmp]# iostat -xtk 1 10 Linux 2.6.18-53.el5 12/18/2012 Time: 09:51:06 PM avg-cpu: %user %nice %system %iowait %steal %idle 5.83 0.19 2.53 11.85 0.00 79.60 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 1.37 118.83 18.70 54.27 131.47 692.72 22.59 4.90 67.19 3.10 22.59 sdb 0.35 39.33 20.33 61.43 158.79 403.22 13.75 5.23 63.93 3.77 30.80 Time: 09:51:07 PM avg-cpu: %user %nice %system %iowait %steal %idle 1.50 0.00 0.50 24.00 0.00 74.00 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 25.00 2.00 2.00 128.00 108.00 118.00 0.03 7.25 4.00 1.60 sdb 0.00 16.00 41.00 145.00 200.00 668.00 9.33 107.92 272.72 5.38 100.10 Time: 09:51:08 PM avg-cpu: %user %nice %system %iowait %steal %idle 2.00 0.00 1.50 29.50 0.00 67.00 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 95.00 3.00 33.00 12.00 480.00 27.33 0.07 1.72 1.31 4.70 sdb 0.00 14.00 1.00 228.00 4.00 960.00 8.42 143.49 568.01 4.37 100.10 Time: 09:51:09 PM avg-cpu: %user %nice %system %iowait %steal %idle 13.28 0.00 2.76 21.30 0.00 62.66 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 21.00 1.00 19.00 16.00 192.00 20.80 0.06 3.55 1.30 2.60 sdb 0.00 36.00 28.00 181.00 124.00 884.00 9.65 121.16 617.31 4.79 100.10 Time: 09:51:10 PM avg-cpu: %user %nice %system %iowait %steal %idle 4.74 0.00 1.50 25.19 0.00 68.58 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 20.00 3.00 15.00 12.00 136.00 16.44 0.17 7.11 3.11 5.60 sdb 0.00 0.00 103.00 60.00 544.00 248.00 9.72 52.35 545.23 6.14 100.10 Time: 09:51:11 PM avg-cpu: %user %nice %system %iowait %steal %idle 1.24 0.00 1.24 25.31 0.00 72.21 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 75.00 4.00 28.00 16.00 416.00 27.00 0.08 3.72 2.03 6.50 sdb 2.00 9.00 124.00 17.00 616.00 104.00 10.21 3.73 213.73 7.10 100.10 Time: 09:51:12 PM avg-cpu: %user %nice %system %iowait %steal %idle 1.00 0.00 0.75 24.31 0.00 73.93 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 24.00 1.00 9.00 4.00 132.00 27.20 0.01 1.20 1.10 1.10 sdb 4.00 40.00 103.00 48.00 528.00 212.00 9.80 105.21 104.32 6.64 100.20 Time: 09:51:13 PM avg-cpu: %user %nice %system %iowait %steal %idle 2.50 0.00 1.75 23.25 0.00 72.50 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 125.74 3.96 46.53 15.84 689.11 27.92 0.20 4.06 2.41 12.18 sdb 2.97 0.00 91.09 84.16 419.80 471.29 10.17 85.85 590.78 5.66 99.11 Time: 09:51:14 PM avg-cpu: %user %nice %system %iowait %steal %idle 0.75 0.00 0.50 24.94 0.00 73.82 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 88.00 1.00 7.00 4.00 380.00 96.00 0.04 4.38 3.00 2.40 sdb 3.00 7.00 111.00 44.00 540.00 208.00 9.65 18.58 581.79 6.46 100.10 Time: 09:51:15 PM avg-cpu: %user %nice %system %iowait %steal %idle 11.03 0.00 3.26 26.57 0.00 59.15 Device: rrqm/s wrqm/sr/sw/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 145.00 7.00 53.00 28.00 792.00 27.33 0.15 2.50 1.55 9.30 sdb 1.00 0.00 155.00 0.00 800.00 0.00 10.32 2.85 18.63 6.46 100.10 [root@server1 tmp]# 

MySQL显示完整的进程列表

 mysql> show full processlist; +------+---------------+-----------+-----------------------+----------------+------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+---------------+-----------+-----------------------+----------------+------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | DB_USER_ONE | localhost | DB_ONE | Query | 3 | waiting for handler insert | INSERT DELAYED INTO defers (mailtime,msgid,email,transport_method,message,host,ip,router,deliveryuser,deliverydomain) VALUES(FROM_UNIXTIME('1355879748'),'1TivwL-0003y8-8l','[email protected]','remote_smtp','SMTP error from remote mail server after initial connection: host mx1.mail.tw.yahoo.com [203.188.197.119]: 421 4.7.0 [TS01] Messages from 75.125.90.146 temporarily deferred due to user complaints - 4.16.55.1; see http://postmaster.yahoo.com/421-ts01.html','mx1.mail.tw.yahoo.com','203.188.197.119','lookuphost','','') | | 2 | DELAYED | localhost | DB_ONE | Delayed insert | 52 | insert | | | 3 | DELAYED | localhost | DB_ONE | Delayed insert | 68 | insert | | | 911 | DELAYED | localhost | DB_ONE | Delayed insert | 99 | Waiting for INSERT | | | 993 | DB_USER_TWO | localhost | DB_TWO | Sleep | 832 | | NULL | | 994 | DB_USER_ONE | localhost | DB_ONE | Query | 185 | Locked | delete from failures where FROM_UNIXTIME(UNIX_TIMESTAMP(NOW())-1296000) > mailtime | | 1102 | DB_USER_THREE | localhost | DB_THREE | Query | 29 | NULL | commit | | 1249 | DB_USER_FOUR | localhost | DB_FOUR | Query | 13 | NULL | commit | | 1263 | root | localhost | DB_FIVE | Query | 0 | NULL | show full processlist | | 1264 | DB_USER_SIX | localhost | DB_SIX | Query | 3 | NULL | commit | +------+---------------+-----------+-----------------------+----------------+------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 10 rows in set (0.00 sec) 

很明显,你的磁盘已经达到了极限。 一般来说,%wa(iowait)应该非常低(一般网站<1%),而你希望你的util%(来自iostat -x)尽可能低(0是可能的)。

您可以使用iotop找出导致所有磁盘使用情况的进程。

如果事实certificate是mysql,你应该在my.cnf中打开日志缓慢的查询(并重新启动mysql)。 那么你将能够找出具体的查询是什么造成的。

要么。 我认为你的SDB坏了。 尝试让硬件检出。

编辑:iotop(通过EPEL可用)是一个了不起的工具,它可以让你知道哪个进程导致艾奥瓦。

你的SDB行事exception。 无论是磁盘驱动器变得不好。 如果您的网站上的stream量模式是相同的,这是一个新问题,那么有足够的证据表明您需要更换sdb。

在Linux的任何IOpath中有两个队列。 一个是IO调度程序队列,由nr_requests控制,另一个是硬件内的队列。 IO的合并发生在调度器层。 所以,当你看到avgqu-sz很小的时候,即平均队列大小很小,等待时间很长,svctm低时,这意味着存储需要花费时间来处理这些IO请求。

意思是,存储本质上是缓慢的,或者说存储很糟

%util显示一个IO在1000毫秒内完成了多less毫秒。 它是越来越锤击你的磁盘。 这并不意味着你的磁盘被严重打击,但在你的情况下,它是缓慢,相当缓慢。