我在raidzconfiguration中有4个2TB USB磁盘的zpool:
[root@chef /mnt/Chef]# zpool status farcryz1 pool: farcryz1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM farcryz1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0
为了testing池,我模拟驱动器故障,从其中一个驱动器拔下USB电缆,而不使其脱机:
[root@chef /mnt/Chef]# zpool status farcryz1 pool: farcryz1 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM farcryz1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da4 ONLINE 22 4 0 da3 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors
数据仍然存在,游泳池仍然在线。 大! 现在让我们尝试恢复池。 我把驱动器插回去了,按照上面的指示发出了zpool replace命令:
[root@chef /mnt/Chef]# zpool replace farcryz1 da4 invalid vdev specification use '-f' to override the following errors: /dev/da4 is part of active pool 'farcryz1'
嗯….这没有什么帮助…所以我尝试了zpool clear farcryz1 ,但是根本没有帮助。 我仍然无法取代da4 。 所以我尝试了online , offline , clear , replace和clear的组合。 现在我被困在这里:
[root@chef /mnt/Chef]# zpool status -v farcryz1 pool: farcryz1 state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: scrub completed after 0h2m with 0 errors on Fri Sep 9 13:43:34 2011 config: NAME STATE READ WRITE CKSUM farcryz1 DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 da4 UNAVAIL 9 0 0 experienced I/O failures da3 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors [root@chef /mnt/Chef]# zpool replace farcryz1 da4 cannot replace da4 with da4: da4 is busy
我怎么能从这种情况下恢复,我的zpool中的一个设备意外地断开连接(但不是一个失败的设备),现在又回来了,准备好复原?
编辑:根据要求, dmesg的tail :
(ses3:umass-sim4:4:0:1): removing device entry (da4:umass-sim4:4:0:0): removing device entry ugen3.2: <Western Digital> at usbus3 umass4: <Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1> on usbus3 da4 at umass-sim4 bus 4 scbus6 target 0 lun 0 da4: <WD My Book 1140 1003> Fixed Direct Access SCSI-6 device da4: 400.000MB/s transfers da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C) ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1 ses3: <WD SES Device 1003> Fixed Enclosure Services SCSI-6 device ses3: 400.000MB/s transfers ses3: SCSI-3 SES Device GEOM: da4: partition 1 does not start on a track boundary. GEOM: da4: partition 1 does not end on a track boundary. GEOM: da4: partition 1 does not start on a track boundary. GEOM: da4: partition 1 does not end on a track boundary. ugen3.2: <Western Digital> at usbus3 (disconnected) umass4: at uhub3, port 1, addr 1 (disconnected) (da4:umass-sim4:4:0:0): lost device (da4:umass-sim4:4:0:0): removing device entry (ses3:umass-sim4:4:0:1): lost device (ses3:umass-sim4:4:0:1): removing device entry ugen3.2: <Western Digital> at usbus3 umass4: <Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1> on usbus3 da4 at umass-sim4 bus 4 scbus6 target 0 lun 0 da4: <WD My Book 1140 1003> Fixed Direct Access SCSI-6 device da4: 400.000MB/s transfers da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C) ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1 ses3: <WD SES Device 1003> Fixed Enclosure Services SCSI-6 device ses3: 400.000MB/s transfers ses3: SCSI-3 SES Device
确定是否需要更换设备,并使用“zpool clear”清除错误,或者用“zpool replace”更换设备。
看起来像在初始临时失败后,您可能只需要执行zpool clear以清除错误。
如果您想假装它是一个驱动器更换,您可能需要首先清除驱动器上的数据,然后再尝试将其重新添加到池中。
如果zpool clear无法解决问题,则可以使用zpool labelclear <partition> (自zfs-v0.6.2以来位于http://zfsonlinux.org中 )使zfs忘记磁盘。
请注意,即使您使用整个设备(例如/dev/sda创buildzpool,也必须指定zfs创build的分区,例如/dev/sda1 。
(积分去DeHackEd, https://github.com/zfsonlinux/zfs/issues/2076 )
从zpool手册页 :
zpool labelclear [-f] device Removes ZFS label information from the specified device. The device must not be part of an active pool configuration. -f Treat exported or foreign devices as inactive.
你试过的各种命令的输出是什么? 你有没有尝试使用-f开关?
你运行zpool clear poolname device-name吗?
在你的情况下, zpool clear farcryz1 da4 – 应该已经进行了重新同步过程。