mdadm和RAID-5恢复

使用mdadm和Debian时遇到了一些与RAID-5arrays有关的问题。

首先，我失去了一个驱动器（完全，它甚至没有被BIOS识别），然后我用一个新的replace它; 重build已经开始，但已经被第二个磁盘上的读取错误中断（这个已被删除）：

raid5:md0: read error not correctable (sector 1398118536 on sdd)

我想这个会在接下来的几天中死掉，但我想重新添加它与降级arrays一起工作来执行一些备份（只有很less的扇区被破坏，我希望在失败前保存最多的数据）。

这里是我的磁盘，按RAID顺序：

sdc – 好的
sdd – （具有读取错误的那个，在重build时从数组中移除）
sde – （死亡的，replace的，但在重build时明显中断=>我不相信它的数据完整性）
sdf – 好的

事实是，我不能重新添加SDD到数组，使用此命令：

 # mdadm --assemble /dev/md0 /dev/sdc1 /dev/sdd1 /dev/sdf1 --force --run mdadm: failed to RUN_ARRAY /dev/md0: Input/output error mdadm: Not enough devices to start the array. # mdadm -D /dev/md0 /dev/md0: Version : 0.90 Creation Time : Tue Aug 24 14:20:39 2010 Raid Level : raid5 Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun Oct 23 01:57:22 2011 State : active, FAILED, Not Started Active Devices : 2 Working Devices : 3 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 128K UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon) Events : 0.131544 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 0 0 1 removed 2 0 0 2 removed 3 8 81 3 active sync /dev/sdf1 4 8 49 - spare /dev/sdd1

正如您所看到的，sdd被识别为备份，而不是作为RAID设备＃1同步。

而我不知道如何告诉mdadm sdd是RAID设备＃1。

如果有人有任何想法，那就太好了！

谢谢。

PS：如果这有帮助，这里是mdadm磁盘检查的输出：

 # mdadm -E /dev/sd[cdef]1 /dev/sdc1: Magic : a92b4efc Version : 0.90.00 UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon) Creation Time : Tue Aug 24 14:20:39 2010 Raid Level : raid5 Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB) Array Size : 4395118464 (4191.51 GiB 4500.60 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 0 Update Time : Sun Oct 23 01:57:22 2011 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 2 Spare Devices : 1 Checksum : dfeeeace - correct Events : 131544 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 0 8 33 0 active sync /dev/sdc1 0 0 8 33 0 active sync /dev/sdc1 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 8 81 3 active sync /dev/sdf1 4 4 8 49 4 spare /dev/sdd1 /dev/sdd1: Magic : a92b4efc Version : 0.90.00 UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon) Creation Time : Tue Aug 24 14:20:39 2010 Raid Level : raid5 Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB) Array Size : 4395118464 (4191.51 GiB 4500.60 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 0 Update Time : Sun Oct 23 01:57:22 2011 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 2 Spare Devices : 1 Checksum : dfeeeae0 - correct Events : 131544 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 4 8 49 4 spare /dev/sdd1 0 0 8 33 0 active sync /dev/sdc1 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 8 81 3 active sync /dev/sdf1 4 4 8 49 4 spare /dev/sdd1 /dev/sde1: Magic : a92b4efc Version : 0.90.00 UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon) Creation Time : Tue Aug 24 14:20:39 2010 Raid Level : raid5 Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB) Array Size : 4395118464 (4191.51 GiB 4500.60 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Sat Oct 22 22:11:52 2011 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 2 Spare Devices : 1 Checksum : dfeeb657 - correct Events : 131534 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 4 8 65 4 spare /dev/sde1 0 0 8 33 0 active sync /dev/sdc1 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 8 81 3 active sync /dev/sdf1 4 4 8 65 4 spare /dev/sde1 /dev/sdf1: Magic : a92b4efc Version : 0.90.00 UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon) Creation Time : Tue Aug 24 14:20:39 2010 Raid Level : raid5 Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB) Array Size : 4395118464 (4191.51 GiB 4500.60 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 0 Update Time : Sun Oct 23 01:57:22 2011 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 2 Spare Devices : 1 Checksum : dfeeeb04 - correct Events : 131544 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 3 8 81 3 active sync /dev/sdf1 0 0 8 33 0 active sync /dev/sdc1 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 8 81 3 active sync /dev/sdf1 4 4 8 49 4 spare /dev/sdd1

你首先需要的是SDD的非RAID拷贝 。使用dd_rescue ，例如。恢复时不要在该RAID中使用该磁盘。

当你有这个副本，用它来启动数组W / O sdE – 放置而不是关键字missing 。两个提示如何做到这一点，即使直接的方式与 – --force失败：

1）您可以使用--assume-clean重新创build您的RAID。（不要忘记这个选项，因为只有超级块才会被更新，而不是奇偶校验）。

2）你可以assembly数组。

在这两种情况下，您必须提供与您的破损RAID 相同的configuration选项 （布局，块大小，磁盘顺序等）。事实上，我build议从-A-S组开始，因为它甚至不更新超级块，同时让你访问你的数据。只有当你确定它是正确的组装，你可以使它坚持假设 – 干净的重新创build。

只要你有3个磁盘运行RAID，只需把你的SDE，而不是丢失一个。