FibreChannel-SAN SCSI块设备丢失

一个FC-SAN上有3个LUN,我想用2个HBA访问(每路有两个path),当系统启动时,一切看起来都很好,但过了一会儿,第二个HBA的sd * -devices消失了,我没有理解为什么或如何在不重新启动的情况下恢复它们扫描SCSI总线仍能find所有的设备,但是内核并不知道块设备,它是最新更新的Red Hat 6.6。

另一个系统上的4个path上可以使用相同的LUN,但只能在这个LUN上使用2个LUN。

有没有人有线索我可能会失踪?

# lspci|grep Fibre 08:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02) 08:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02) # lsscsi ... [1:0:0:1] disk DataCore Virtual Disk DCS /dev/sdb [1:0:0:2] disk DataCore Virtual Disk DCS /dev/sdc [1:0:0:3] disk DataCore Virtual Disk DCS /dev/sdd [1:0:1:1] disk DataCore Virtual Disk DCS /dev/sde [1:0:1:2] disk DataCore Virtual Disk DCS /dev/sdf [1:0:1:3] disk DataCore Virtual Disk DCS /dev/sdg [2:0:0:1] disk DataCore Virtual Disk DCS - [2:0:0:2] disk DataCore Virtual Disk DCS - [2:0:0:3] disk DataCore Virtual Disk DCS - [2:0:1:1] disk DataCore Virtual Disk DCS - [2:0:1:2] disk DataCore Virtual Disk DCS - [2:0:1:3] disk DataCore Virtual Disk DCS - ... # rescan-scsi-bus.sh ... 0 new or changed device(s) found. 0 remapped or resized device(s) found. 0 device(s) removed. 

当它发生时logging下来:

 May 24 12:08:57 hostname kernel: sd 1:0:0:1: Parameters changed May 24 12:08:57 hostname kernel: sd 1:0:1:3: Parameters changed May 24 12:09:01 hostname kernel: sd 1:0:1:2: Parameters changed May 24 12:09:24 hostname kernel: sd 1:0:1:1: Parameters changed May 24 12:09:24 hostname kernel: sd 2:0:0:1: rejecting I/O to offline device May 24 12:09:25 hostname multipathd: checker failed path 8:112 in map lun0 May 24 12:09:25 hostname multipathd: ora_data2: remaining active paths: 3 May 24 12:09:25 hostname multipathd: checker failed path 8:128 in map lun1 May 24 12:09:25 hostname multipathd: ora_acfs1: remaining active paths: 3 May 24 12:09:25 hostname multipathd: checker failed path 8:144 in map lun2 May 24 12:09:25 hostname multipathd: ora_acfs2: remaining active paths: 3 May 24 12:09:25 hostname multipathd: checker failed path 8:160 in map lun0 May 24 12:09:25 hostname multipathd: ora_data2: remaining active paths: 2 May 24 12:09:25 hostname multipathd: checker failed path 8:176 in map lun1 May 24 12:09:25 hostname multipathd: ora_acfs1: remaining active paths: 2 May 24 12:09:25 hostname multipathd: checker failed path 8:192 in map lun2 May 24 12:09:25 hostname multipathd: ora_acfs2: remaining active paths: 2 May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:112. May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:128. May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:144. May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:160. May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:176. May 24 12:09:25 hostname kernel: device-mapper: multipath: Failing path 8:192. 

不幸的是,我无法访问SAN设备,但我被告知没有任何东西被触动。

我刚才看到,这些设备实际上已经走了,但是两个小时后又回来了:

 May 24 14:06:35 hostname kernel: scsi 2:0:1:1: Attached scsi generic sg9 type 0 May 24 14:06:35 hostname kernel: scsi 2:0:1:2: Attached scsi generic sg10 type 0 May 24 14:06:35 hostname kernel: scsi 2:0:1:3: Attached scsi generic sg11 type 0 May 24 14:06:37 hostname kernel: scsi 2:0:0:1: Attached scsi generic sg12 type 0 May 24 14:06:37 hostname kernel: scsi 2:0:0:2: Attached scsi generic sg13 type 0 May 24 14:06:37 hostname kernel: scsi 2:0:0:3: Attached scsi generic sg14 type 0 

在这段时间内,FC交换机可能会被closures。 当系统先前启动,SD设备像往常一样创build,线略有不同:

 May 24 11:14:15 hostname kernel: sd 2:0:1:3: Attached scsi generic sg14 type 0 

 May 24 14:06:35 hostname kernel: scsi 2:0:1:1: Attached scsi generic sg9 type 0 

它说“scsi”而不是“sd”。