QLE2562 CentOS 5.3上的HBA / qla2xxx问题

我有几台运行CentOS 5.3(kernel-2.6.18-128.1.16.el5)的Linux服务器(SunFire X4270)和Qlogic FC-8 QLE2562 HBA …我遇到了很多这些新服务器的问题,他们每秒钟显示以下消息:

qla2xxx 0000:2f:00.0: Passthru CT request failed to login management server qla2xxx 0000:2f:00.0: Passthru CT failed qla2xxx 0000:2f:00.1: Passthru CT request failed to login management server qla2xxx 0000:2f:00.1: Passthru CT failed 

此外,我有几个服务器惊恐地结束以下跟踪(见下文)。 我已经尝试了CentOS 5.3 2.6.18-128.el5和2.6.18-128.1.16.el5(最新版本)的几个内核版本,我也尝试了Qlogic的4.06embedded式QLE2562固件的最新驱动,没有成功。 奇怪的是,我有一个其他的服务器,具有相同的硬件/软件configuration运行良好(稳定…)。 Sun的支持(可用于这些服务器)还没有能够解决这个问题呢…任何想法?

 qla2xxx_eh_abort(8): aborting sp ffff81037d86ebc0 from RISC. pid=952 sp->state=7 q->q_flag=2 qla2xxx 0000:2f:00.1: Mailbox command timeout occurred. Issuing ISP abort. NMI Watchdog detected LOCKUP on CPU 13 CPU 13 Modules linked in: autofs4 sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev qla2xxx(U) qla2xxx_conf(U) igb i2c_i801 intermodule(U) i2c_core sg pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 2982, comm: scsi_eh_8 Tainted: G 2.6.18-128.el5 #1 RIP: 0010:[<ffffffff8000c6f2>] [<ffffffff8000c6f2>] __delay+0x8/0x10 RSP: 0018:ffff81067dc7db88 EFLAGS: 00000097 RAX: 00000000ecd06b41 RBX: 000000000018c42b RCX: 00000000ecd05808 RDX: 0000000000000324 RSI: 0000000000000046 RDI: 0000000000003689 RBP: ffffc20000034000 R08: 0000000000000002 R09: ffff81067dc7db54 R10: 0000000000000001 R11: ffffffff80213fbd R12: ffff81037e84c4f8 R13: 0000000000000246 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff81067fc46140(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006bb424 CR3: 000000067d035000 CR4: 00000000000006e0 Process scsi_eh_8 (pid: 2982, threadinfo ffff81067dc7c000, task ffff81010c6ec040) Stack: ffffffff8827f743 ffff81037e84c4f8 ffff81067dc7dc90 ffff81060000dc20 ffff81037fa461c8 ffff81037e84c4f8 ffff81067dc7dc90 0000000000000100 ffffffff88285488 ffff81037fa461c8 ffff81037e84c4f8 ffff81067dc7dc90 Call Trace: [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032360>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032262>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: 29 c8 48 39 f8 72 f5 c3 41 54 83 3d ad d8 3c 00 00 49 89 f4 Kernel panic - not syncing: nmi watchdog BUG: warning at kernel/panic.c:137/panic() (Tainted: G ) Call Trace: <NMI> [<ffffffff8008efff>] panic+0x1da/0x1eb [<ffffffff8006ba21>] _show_stack+0xdb/0xea [<ffffffff8006bb14>] show_registers+0xe4/0x100 [<ffffffff8006537d>] die_nmi+0x66/0xa3 [<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3 [<ffffffff800656e1>] default_do_nmi+0x81/0x225 [<ffffffff8006594e>] do_nmi+0x43/0x61 [<ffffffff80064fa7>] nmi+0x7f/0x88 [<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92 [<ffffffff8000c6f2>] __delay+0x8/0x10 <<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032360>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032262>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 BUG: warning at drivers/input/serio/i8042.c:846/i8042_panic_blink() (Tainted: G ) Call Trace: <NMI> [<ffffffff801fa015>] i8042_panic_blink+0x112/0x2a5 [<ffffffff8008efa5>] panic+0x180/0x1eb [<ffffffff8006ba21>] _show_stack+0xdb/0xea [<ffffffff8006bb14>] show_registers+0xe4/0x100 [<ffffffff8006537d>] die_nmi+0x66/0xa3 [<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3 [<ffffffff800656e1>] default_do_nmi+0x81/0x225 [<ffffffff8006594e>] do_nmi+0x43/0x61 [<ffffffff80064fa7>] nmi+0x7f/0x88 [<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92 [<ffffffff8000c6f2>] __delay+0x8/0x10 <<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032360>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032262>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 BUG: warning at drivers/input/serio/i8042.c:849/i8042_panic_blink() (Tainted: G ) Call Trace: <NMI> [<ffffffff801fa0fe>] i8042_panic_blink+0x1fb/0x2a5 [<ffffffff8008efa5>] panic+0x180/0x1eb [<ffffffff8006ba21>] _show_stack+0xdb/0xea [<ffffffff8006bb14>] show_registers+0xe4/0x100 [<ffffffff8006537d>] die_nmi+0x66/0xa3 [<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3 [<ffffffff800656e1>] default_do_nmi+0x81/0x225 [<ffffffff8006594e>] do_nmi+0x43/0x61 [<ffffffff80064fa7>] nmi+0x7f/0x88 [<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92 [<ffffffff8000c6f2>] __delay+0x8/0x10 <<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032360>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032262>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 BUG: warning at drivers/input/serio/i8042.c:851/i8042_panic_blink() (Tainted: G ) Call Trace: <NMI> [<ffffffff801fa17b>] i8042_panic_blink+0x278/0x2a5 [<ffffffff8008efa5>] panic+0x180/0x1eb [<ffffffff8006ba21>] _show_stack+0xdb/0xea [<ffffffff8006bb14>] show_registers+0xe4/0x100 [<ffffffff8006537d>] die_nmi+0x66/0xa3 [<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3 [<ffffffff800656e1>] default_do_nmi+0x81/0x225 [<ffffffff8006594e>] do_nmi+0x43/0x61 [<ffffffff80064fa7>] nmi+0x7f/0x88 [<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92 [<ffffffff8000c6f2>] __delay+0x8/0x10 <<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e [<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b [<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553 [<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf [<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0 [<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5 [<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124 [<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9 [<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac [<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032360>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032262>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 

对于qla2xxx 0000:2f:00.0: Passthru CT request failed to login management server如果它只附加在一个服务器上,可能是卡有硬件问题。 你有没有试图把这张卡放在另一台服务器上?
对于运行良好的服务器,我会尝试通过将他的卡从serverA到serverB进行相同的testing,并查看serverB是否开始稳定或serverA是否仍然稳定。

谢谢方圆。 Passthru CT request failed似乎是一个硬件问题(尚未完全validation)。 对于另一个大问题,它与我们拥有的PCIe Active Riser卡(Sun X4270硬件)有关:这些卡包含与QLE2562冲突的PCIe交换机(由Sun支持级别2validation/复制的问题)…如果遇到Sun硬件出现此问题时,请尝试将HBA置于未交换的PCIe插槽中(X4270上的插槽0和3,因为Riser 0不是活动的Riser,而是位于16x插槽中)。 Sun正在解决他们的机器上的问题,以允许用户把HBA放在任何插槽中。

qla2xxx_eh_abort(8):中止sp。 这个问题与安装在Sun服务器上的HBA卡完全相关。实际上,我们最近遇到了这个问题,最近的date是2012年12月16日。 所以请更换hba卡,它将完全解决问题。