我在几个月前虚拟化了一个数据中心,我们拥有3台HP DL360 G5服务器,每台服务器都有32GB的内存和双Intel Xeon。 最近我们遇到了两个问题,第一个是磁盘读取速度变得非常慢。 在只有几个文件的linux虚拟机上键入“ls”需要很多秒才能返回文件列表。 而且,集群上的虚拟机有时会自己重新安装为只读文件系统。 主机上的Dmesg会产生大量的“DRDY ERR”错误。 我们使用的主要存储库在Drobo B800i上,通过isci共享。 我发布了iostat和下面的dmesg中的DRDY错误的grep,这些是企业服务器,它们间歇性地断开,这从来都不是好事:
这里是一个服务器的Iostat:[root @ XenServer-1 tmp]#iostat Linux 2.6.32.43-0.4.1.xs1.8.0.835.170778xen(XenServer-1.ethoplex.com)07/31/2014
avg-cpu: %user %nice %system %iowait %steal %idle 0.42 0.00 0.46 3.51 0.40 95.21 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn cciss/c0d0 17.30 76.54 304.24 893755376 3552874247 cciss/c0d0p1 1.04 0.27 22.82 3169526 266433488 cciss/c0d0p2 0.00 0.01 0.00 73890 0 cciss/c0d0p3 16.25 76.24 281.43 890365720 3286440759 sda 76.84 59.78 87.32 698047689 1019733585 dm-0 0.68 0.95 0.28 11071656 3217737 sdb 3.44 177.64 37.74 2074378210 440737634 dm-2 0.00 0.01 0.00 135808 2216 dm-3 12.23 361.61 131.55 4222728781 1536204287 sdc 4.05 27.93 328.02 326147810 3830552980 sdd 6.23 101.72 113.03 1187808537 1319897350 tda 1.61 9.74 40.01 113749658 467248640 dm-28 0.84 36.78 23.11 429521222 269838659 dm-14 0.24 56.24 0.00 656723598 0 dm-21 0.08 18.17 0.00 212172507 0 tdb 0.08 0.12 1.44 1384368 16853616 dm-5 0.38 4.03 36.17 47063052 422416430 tdc 0.61 4.03 36.10 47062722 421602000 dm-7 1.26 17.74 5.51 207110960 64292628 tde 1.22 17.64 5.49 206019946 64129696 dm-30 0.03 0.01 0.60 61956 6979438 dm-4 0.02 0.00 8.85 1014 103326613 tdd 0.11 0.00 8.82 1264 103049216 dm-9 0.00 0.02 0.05 175978 591472 tdg 0.00 0.02 0.05 175950 590704 dm-10 0.01 0.09 0.21 1104226 2488947 tdf 0.01 0.09 0.21 1105562 2472346 dm-6 0.00 0.00 0.04 1568 419135 dm-16 0.00 0.01 0.00 132105 0 dm-17 0.03 0.05 0.76 625890 8867990 dm-8 0.00 0.06 0.10 752923 1226072 tdh 0.00 0.07 0.10 788356 1218922 tdi 0.00 0.00 0.00 884 0
Dmesg Grep DRDY:
[11645348.631020] ata1.00: status: { DRDY ERR } [11646434.714902] ata1.00: status: { DRDY ERR } [11648427.773389] ata1.00: status: { DRDY ERR } [11648950.139954] ata1.00: status: { DRDY ERR } [11649612.475350] ata1.00: status: { DRDY ERR } [11650177.522603] ata1.00: status: { DRDY ERR } [11650649.818020] ata1.00: status: { DRDY } [11651837.989833] ata1.00: status: { DRDY ERR } [11654729.414605] ata1.00: status: { DRDY ERR } [11655685.782290] ata1.00: status: { DRDY ERR } [11657120.774143] ata1.00: status: { DRDY ERR } [11659704.724995] ata1.00: status: { DRDY } [11661322.210812] ata1.00: status: { DRDY ERR } [11662029.088563] ata1.00: status: { DRDY ERR } [11663314.187972] ata1.00: status: { DRDY ERR } [11667978.796829] ata1.00: status: { DRDY ERR } [11670487.088008] ata1.00: status: { DRDY ERR } [11671800.577054] ata1.00: status: { DRDY ERR }
dmesg的:
[11464689.083861] sr 1:0:0:0: CDB: Get event status notification: 4a 01 00 00 10 00 00 00 08 00 [11464689.083875] ata1.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in [11464689.083876]res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) [11464689.083896] ata1.00: status: { DRDY } [11464694.133755] ata1: link is slow to respond, please be patient (ready=0) [11464699.123711] ata1: device not ready (errno=-16), forcing hardreset [11464699.123727] ata1: soft resetting link [11464699.344063] ata1.00: configured for PIO0 [11464699.348375] ata1: EH complete [11464706.383733] ata1.00: qc timeout (cmd 0xa0) [11464706.383766] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [11464706.383782] sr 1:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00 [11464706.383794] ata1.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [11464706.383795]res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout) [11464706.383806] ata1.00: status: { DRDY ERR } [11464711.433625] ata1: link is slow to respond, please be patient (ready=0) [11464716.433591] ata1: device not ready (errno=-16), forcing hardreset