内核:scsi 0:0:0:0:拒绝I / O死的设备

昨天,OSSEC给我发了一封警告邮件:

Jul 29 21:25:16 SVR4149 kernel: end_request: I/O error, dev sda, sector 334634969 Jul 29 21:25:16 SVR4149 kernel: sd 0:0:0:0: SCSI error: return code = 0x00040000 Jul 29 21:25:16 SVR4149 kernel: end_request: I/O error, dev sda, sector 334634977 Jul 29 21:28:28 SVR4149 kernel: sd 0:0:0:0: SCSI error: return code = 0x00040000 

令人惊讶的是当时我只有/dev/sdb设备。

 # fdisk -l Disk /dev/sdb: 320.0 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 * 1 13 104391 83 Linux /dev/sdb2 14 7662 61440592+ 83 Linux /dev/sdb3 7663 8706 8385930 82 Linux swap / Solaris /dev/sdb4 8707 38888 242436915 5 Extended /dev/sdb5 8707 38888 242436883+ 83 Linux 

谷歌search后,我发现这个链接。 执行build议的命令,它将带回我的/dev/sdc

 Jul 29 22:55:45 SVR4149 kernel: ata1: hard resetting link Jul 29 22:55:45 SVR4149 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jul 29 22:55:45 SVR4149 kernel: ata1.00: ATA-8: WDC WD3202ABYS-01B7A0, 02.03B02, max UDMA/133 Jul 29 22:55:45 SVR4149 kernel: ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) Jul 29 22:55:45 SVR4149 kernel: ata1.00: configured for UDMA/133 Jul 29 22:55:45 SVR4149 kernel: ata1: EH complete Jul 29 22:55:45 SVR4149 kernel: ata1.00: detaching (SCSI 0:0:0:0) Jul 29 22:55:45 SVR4149 kernel: Vendor: ATA Model: WDC WD3202ABYS-0 Rev: 02.0 Jul 29 22:55:45 SVR4149 kernel: Type: Direct-Access ANSI SCSI revision: 05 Jul 29 22:55:45 SVR4149 kernel: SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB) Jul 29 22:55:45 SVR4149 kernel: sdc: Write Protect is off Jul 29 22:55:45 SVR4149 kernel: sdc: Mode Sense: 00 3a 00 00 Jul 29 22:55:45 SVR4149 kernel: SCSI device sdc: drive cache: write back Jul 29 22:55:53 SVR4149 kernel: SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB) Jul 29 22:55:53 SVR4149 kernel: sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 > Jul 29 22:55:53 SVR4149 kernel: sd 0:0:0:0: Attached scsi disk sdc Jul 29 22:55:53 SVR4149 kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0 

使用fdisk重新检查:

 # fdisk -l Disk /dev/sdb: 320.0 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 * 1 13 104391 83 Linux /dev/sdb2 14 7662 61440592+ 83 Linux /dev/sdb3 7663 8706 8385930 82 Linux swap / Solaris /dev/sdb4 8707 38888 242436915 5 Extended /dev/sdb5 8707 38888 242436883+ 83 Linux Disk /dev/sdc: 320.0 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdc1 * 1 13 104391 83 Linux /dev/sdc2 14 7662 61440592+ 83 Linux /dev/sdc3 7663 8706 8385930 82 Linux swap / Solaris /dev/sdc4 8707 38888 242436915 5 Extended /dev/sdc5 8707 38888 242436883+ 83 Linux 

但是我从内核日志中得到了另一个问题:

 Jul 30 01:03:41 SVR4149 kernel: scsi 0:0:0:0: rejecting I/O to dead device Jul 30 01:14:40 SVR4149 kernel: scsi 0:0:0:0: rejecting I/O to dead device Jul 30 01:16:41 SVR4149 kernel: scsi 0:0:0:0: rejecting I/O to dead device Jul 30 01:53:18 SVR4149 last message repeated 7 times 

smartd不断打开不存在的设备:

 Jul 30 10:00:57 SVR4149 smartd[3749]: Device: /dev/sda, No such device, open() failed 

在我的smartd.conf文件中没有特别的smartd.conf

 # grep -v "^#" /etc/smartd.conf | sed '/^$/d' DEVICESCAN -H -m root 

我的scsi0会“死”吗?

 cat /proc/scsi/scsi Attached devices: Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: WDC WD3202ABYS-0 Rev: 02.0 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: WDC WD3202ABYS-0 Rev: 02.0 Type: Direct-Access ANSI SCSI revision: 05 

任何帮助将不胜感激。

看起来像驱动器正在下降,然后重新连接。 这表明三件事之一:

  1. 最有可能的一个坏的驱动器,我会开始检查SMART日志,看看你到达那里。
  2. 一个不好的电缆/ SCSI控制器(通常是RAID卡)…如果SMART检出并继续,请先交换电缆,然后交换卡。
  3. 您正在执行如此多的持续性磁盘I / O操作,以至于超负荷了磁盘控制器…您应该能够判断是否过载了I / O。

希望有助于…它是一个可怕的消息得到。