我正在使用Ubuntu 12.04.4上的Adaptec ASA-71605H HBA构build一个ZFS NAS。
现代Linux内核附带所需pm80xx内核模块的开源版本。 Adaptec为Ubuntu 12.04提供了一个驱动程序,我testing了它们的效果。
我看到的症状是,开机后不时只能使用16个驱动器中的14个。
完整的dmesg日志可在这里 ,有趣的部分是
[ 3.591035] pm80xx 0000:01:00.0: driver version 0.1.37 / 1.0.15-1 [ 50.749419] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 [ 50.749424] sas: ata1: end_device-1:0: dev error handler [ 50.749430] sas: ata2: end_device-1:1: dev error handler [ 50.749433] sas: ata3: end_device-1:2: dev error handler [ 55.900826] ata3.00: qc timeout (cmd 0xec) [ 55.900899] pm80xx:: mpi_sata_completion 2049: SATA IO STATUS 0x1 task ffff8807ee8cc000 [ 55.900900] pm80xx:: mpi_sata_completion 2085: status:0x1, tag:0x2, task::0xffff8807ee8cc000 [ 55.900831] pm80xx:: pm8001_chip_abort_task 4889: cmd_tag = 0x3, abort task tag = 0x2 [ 55.900902] pm80xx:: mpi_sata_completion 2118: SAS Address of IO Failure Drive:50000d1106c76219<6> [ 55.900903] pm80xx:: mpi_sata_completion 2493: task 0xffff8807ee8cc000 done with io_status 0x1 resp 0x0 stat 0x8d but aborted by upper layer! [ 55.900906] pm80xx:: pm8001_mpi_task_abort_resp 3840: ABORT status = 0x0 task ffff8807ee8cc1c0 [ 55.900907] pm80xx:: pm8001_mpi_task_abort_resp 3856: ABORT IO_SUCCESS for tag 3 ,task ffff8807ee8cc1c0 [ 55.900911] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 66.049020] ata3.00: qc timeout (cmd 0xec) [ 66.049087] pm80xx:: mpi_sata_completion 2049: SATA IO STATUS 0x1 task ffff8807ee8cc000 [ 66.049088] pm80xx:: mpi_sata_completion 2085: status:0x1, tag:0x2, task::0xffff8807ee8cc000 [ 66.049025] pm80xx:: pm8001_chip_abort_task 4889: cmd_tag = 0x3, abort task tag = 0x2 [ 66.049089] pm80xx:: mpi_sata_completion 2118: SAS Address of IO Failure Drive:50000d1106c76219<6> [ 66.049091] pm80xx:: mpi_sata_completion 2493: task 0xffff8807ee8cc000 done with io_status 0x1 resp 0x0 stat 0x8d but aborted by upper layer! [ 66.049093] pm80xx:: pm8001_mpi_task_abort_resp 3840: ABORT status = 0x0 task ffff8807ee8cc1c0 [ 66.049094] pm80xx:: pm8001_mpi_task_abort_resp 3856: ABORT IO_SUCCESS for tag 3 ,task ffff8807ee8cc1c0 [ 66.049098] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 96.181921] ata3.00: qc timeout (cmd 0xec) [ 96.182001] pm80xx:: mpi_sata_completion 2049: SATA IO STATUS 0x1 task ffff8807ee8cc000 [ 96.182009] pm80xx:: mpi_sata_completion 2085: status:0x1, tag:0x2, task::0xffff8807ee8cc000 [ 96.181934] pm80xx:: pm8001_chip_abort_task 4889: cmd_tag = 0x3, abort task tag = 0x2 [ 96.182014] pm80xx:: mpi_sata_completion 2118: SAS Address of IO Failure Drive:50000d1106c76219<6> [ 96.182020] pm80xx:: mpi_sata_completion 2493: task 0xffff8807ee8cc000 done with io_status 0x1 resp 0x0 stat 0x8d but aborted by upper layer! [ 96.182025] pm80xx:: pm8001_mpi_task_abort_resp 3840: ABORT status = 0x0 task ffff8807ee8121c0 [ 96.182029] pm80xx:: pm8001_mpi_task_abort_resp 3856: ABORT IO_SUCCESS for tag 3 ,task ffff8807ee8121c0 [ 96.182043] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 96.337817] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 [ 96.354159] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 [ 96.354177] sas: ata1: end_device-1:0: dev error handler [ 96.354194] sas: ata2: end_device-1:1: dev error handler [ 96.354204] sas: ata3: end_device-1:2: dev error handler [ 96.354210] sas: ata4: end_device-1:3: dev error handler [ 96.510401] ata4.00: ATA-9: ST4000VN000-1H4168, SC43, max UDMA/133 [ 96.510409] ata4.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 96.511106] ata4.00: configured for UDMA/133 [ 96.511134] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 [ 96.526013] scsi 1:0:3:0: Direct-Access ATA ST4000VN000-1H41 SC43 PQ: 0 ANSI: 5
第一个大块显示一个驱动器检测失败的样子,第二个检测成功的样子。
所有的硬盘驱动器在进行完整的构build之前都经过了多次无误的testing。 这并不总是相同的驱动器退出,这似乎是完全随机的。
另一个问题表明,错误来自共享IRQ 16,事实上,我有时会有错误日志指向IRQ 16.不幸的是,我不知道是否有可能使用另一个IRQ,因为BIOS不允许这样的事情我和使用另一个PCIe插槽不是一个选项链接速度明智的。
任何帮助是非常受欢迎的。 我接近订购LSI控制器,看看它是否有帮助,但希望能与Adaptec合作。 我只关心我的数据到这个控制器。
更新 :问题继续。 即使发现所有驱动器,libsas和pm80xx内核模块都会随机出现内核恐慌。 在生产中也不可用。 考虑获得LSI 9201-16i …