我一直在解决这个问题一段时间了。
我有3个磁盘,1.5TB,2TB和3TB的逻辑卷。 1.5TB驱动器失败。 大量的I / O错误和死亡的坏道。 我开始pvmove将失败的驱动器上的现有盘区移动到3TB驱动器(有足够的空间)。 我搬了99%的范围,但最后百分比似乎是不可能读取。 读取失败,pvmove退出。
这是目前的状态:
root@server:~# pvdisplay /dev/sdd: read failed after 0 of 4096 at 0: Input/output error /dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error /dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error /dev/sdd: read failed after 0 of 4096 at 4096: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error /dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK. --- Physical volume --- PV Name /dev/sda # old, working drive VG Name lvm_group1 PV Size 1.82 TiB / not usable 1.09 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 476932 Free PE 0 Allocated PE 476932 PV UUID FEoDYU-Lhjf-FdI1-Ei5p-koue-PIma-TGvs9A --- Physical volume --- PV Name /dev/sdd1 # old failing drive VG Name lvm_group1 PV Size 1.36 TiB / not usable 2.40 MiB Allocatable NO PE Size 4.00 MiB Total PE 357699 Free PE 357600 Allocated PE 99 PV UUID hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK --- Physical volume --- PV Name /dev/sdf # new drive VG Name lvm_group1 PV Size 2.73 TiB / not usable 4.46 MiB Allocatable yes PE Size 4.00 MiB Total PE 715396 Free PE 357746 Allocated PE 357650 PV UUID qs4BVK-PAPv-I1DG-x5wJ-dRNq-vhBE-wQeJL6
这就是pvmove所说的:
root@server:~# pvmove /dev/sdd1:335950-336500 /dev/sdf --verbose Finding volume group "lvm_group1" Archiving volume group "lvm_group1" metadata (seqno 93). Creating logical volume pvmove0 Moving 50 extents of logical volume lvm_group1/cryptex Found volume group "lvm_group1" activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/cryptex Updating volume group metadata Found volume group "lvm_group1" Found volume group "lvm_group1" Creating lvm_group1-pvmove0 Loading lvm_group1-pvmove0 table (253:2) Loading lvm_group1-cryptex table (253:0) Suspending lvm_group1-cryptex (253:0) with device flush Suspending lvm_group1-pvmove0 (253:2) with device flush Found volume group "lvm_group1" activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/pvmove0 Resuming lvm_group1-pvmove0 (253:2) Found volume group "lvm_group1" Loading lvm_group1-pvmove0 table (253:2) Suppressed lvm_group1-pvmove0 identical table reload. Resuming lvm_group1-cryptex (253:0) Creating volume group backup "/etc/lvm/backup/lvm_group1" (seqno 94). Checking progress before waiting every 15 seconds /dev/sdd1: Moved: 4.0% /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error No physical volume label read from /dev/sdd1 Physical volume /dev/sdd1 not found ABORTING: Can't reread PV /dev/sdd1 ABORTING: Can't reread VG for /dev/sdd1
失败的驱动器上只剩下99个扩展盘区。 我可以丢失这些数据 – 我只是想把这个驱动器扔掉而不会丢失其他驱动器上的数据。
所以我尝试了pvremove:
root@server:~# pvremove /dev/sdd1 /dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error /dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error No physical volume label read from /dev/sdd1 Physical Volume /dev/sdd1 not found
然后vgreduce:
root@server:~# vgreduce lvm_group1 --removemissing /dev/sdd: read failed after 0 of 4096 at 0: Input/output error /dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error /dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error /dev/sdd: read failed after 0 of 4096 at 4096: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error /dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK. WARNING: Partial LV cryptex needs to be repaired or removed. WARNING: Partial LV pvmove0 needs to be repaired or removed. There are still partial LVs in VG lvm_group1. To remove them unconditionally use: vgreduce --removemissing --force. Proceeding to remove empty missing PVs.
pvdisplay仍然显示失败的驱动器…
有任何想法吗?
最后,我通过手动编辑/etc/lvm/backup/lvm_group1来解决这个问题。
以下是任何其他人遇到此问题的步骤:
vgreduce lvm_group1 --removemissing --force vgcfgrestore -f edited_config_file.cfg lvm_group1 我花了4天的时间学习LVM的进出口来解决这个问题。
到目前为止它看起来不错。 没有错误。 快乐露营。
如果你可以暂时停止LVM(并且closures下面的LUKS容器,如果使用的话),可以使用GNU ddrescue将尽可能多的PV(或者下面的LUKS容器)拷贝到好的磁盘上,并且删除旧磁盘,然后重新启动LVM。
虽然我喜欢Sniku的LVM解决scheme,但ddrescue或许可以恢复比pvmove更多的数据。
(停止LVM的原因是,LVM具有多path支持,并且一旦LVM发现它们就会在具有相同UUID的PV对之间平衡写操作。此外,应该停止LVM和LUKS以确保所有最近写在基础设备上是可见的。重新启动系统并且不提供LUKS密码是确保它的最简单的方法。)