我有一个服务器与mdadm raid0:
# mdadm --version mdadm - v3.1.4 - 31st August 2010 # uname -a Linux orkan 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 GNU/Linux
其中一个磁盘失败了:
# grep sdf /var/log/kern.log | head Jan 30 19:08:06 orkan kernel: [163492.873861] sd 2:0:9:0: [sdf] Unhandled error code Jan 30 19:08:06 orkan kernel: [163492.873869] sd 2:0:9:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jan 30 19:08:06 orkan kernel: [163492.873874] sd 2:0:9:0: [sdf] Sense Key : Hardware Error [deferred]
现在在dmesg我可以看到:
Jan 31 15:59:49 orkan kernel: [238587.307760] sd 2:0:9:0: rejecting I/O to offline device Jan 31 15:59:49 orkan kernel: [238587.307859] sd 2:0:9:0: rejecting I/O to offline device Jan 31 16:03:58 orkan kernel: [238836.627865] __ratelimit: 10 callbacks suppressed Jan 31 16:03:58 orkan kernel: [238836.627872] mdadm: sending ioctl 1261 to a partition! Jan 31 16:03:58 orkan kernel: [238836.627878] mdadm: sending ioctl 1261 to a partition! Jan 31 16:04:09 orkan kernel: [238847.215187] mdadm: sending ioctl 1261 to a partition! Jan 31 16:04:09 orkan kernel: [238847.215195] mdadm: sending ioctl 1261 to a partition!
但是mdadm没有注意到驱动器已经失败:
# mdadm -D /dev/md0 /dev/md0: Version : 0.90 Creation Time : Thu Jan 13 15:19:05 2011 Raid Level : raid0 Array Size : 71682176 (68.36 GiB 73.40 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Sep 22 14:37:24 2011 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Chunk Size : 64K UUID : 7e018643:d6173e01:17ab5d05:f75b494e Events : 0.9 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 65 1 active sync /dev/sde1 2 8 81 2 active sync /dev/sdf1
此外,强制从/ dev / md0读取确实支持/ dev / sdf失败的理论,但mdadm不会将驱动器标记为失败:
# dd if=/dev/md0 of=/root/md.data bs=512 skip=255 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00367142 s, 139 kB/s # dd if=/dev/md0 of=/root/md.data bs=512 skip=256 count=1 dd: reading `/dev/md0': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000359543 s, 0.0 kB/s # dd if=/dev/md0 of=/root/md.data bs=512 skip=383 count=1 dd: reading `/dev/md0': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000422959 s, 0.0 kB/s # dd if=/dev/md0 of=/root/md.data bs=512 skip=384 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000314845 s, 1.6 MB/s
但是尝试访问/ dev / sdf磁盘失败:
# dd if=/dev/sdf of=/root/sdf.data bs=512 count=1 dd: opening `/dev/sdf': No such device or address
数据对我来说并不重要,我只想了解为什么mdadm坚持数组是“State:clean”
除了显而易见的 – 只有不重视数据的人运行RAID-0 – 除非运行监视器守护进程: mdadm --monitor /dev/md0否则mdadm不会在任何情况下提醒您。
您可以使用以下命令显式检查有问题的设备: mdadm -E /dev/sdf 。
当然,检测到RAID-0arrays发生故障是非常没有意义的:它会丢失,从备份恢复。