带有H240的CentOS 7 DL120 G9 – 监控RAID问题

我刚刚使用智能HBA H240卡configuration了一台新服务器,并安装了hpssaducli,它检测到控制器并允许我生成报告。

我遇到的问题是如何检测RAID故障并发送警报。

通过hpssaducli生成的报告包含大量难以筛选的信息,目前还没有发生故障的arrays,因此不知道在发生故障的驱动器时需要find哪些信息。

细节

root@server [~]# lsmod | grep hp hpwdt 14242 0 hpilo 17381 0 shpchp 37032 0 hpsa 94958 3 root@server [~]# rpm -qa | grep hpsa kmod-hpsa-3.4.12-110.rhel7u1.x86_64 root@server [~]# uname -a Linux server.hostname 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux root@server [~]# hpssaducli HP Smart Storage Diagnostics 2.10.14.0 Usage: hpssaducli [ -adu | -ssd | -val ] [ command-specific options ] ... ... Diagnosable devices: Smart HBA H240 in Slot 2 

hpssacli的输出

 root@server [~]# hpssacli ctrl all show config detail Smart HBA H240 in Slot 2 (RAID Mode) Bus Interface: PCI Slot: 2 Serial Number: XXXXXXXXX Cache Serial Number: XXXXXXXXX Controller Status: OK Hardware Revision: B Firmware Version: 1.34 Rebuild Priority: High Surface Scan Delay: 3 secs Surface Scan Mode: Idle Parallel Surface Scan Supported: No Queue Depth: Automatic Monitor and Performance Delay: 60 min Elevator Sort: Enabled Degraded Performance Optimization: Disabled Inconsistency Repair Policy: Disabled Wait for Cache Room: Disabled Surface Analysis Inconsistency Notification: Disabled Post Prompt Timeout: 15 secs Cache Board Present: False Drive Write Cache: Disabled Controller Memory Size: 256 MB SATA NCQ Supported: True Spare Activation Mode: Activate on physical drive failure (default) Controller Temperature (C): 72 Cache Module Temperature (C): 36 Number of Ports: 2 Internal only Encryption: Disabled Express Local Encryption: False Driver Name: hpsa Driver Version: 3.4.12 Driver Supports HP SSD Smart Path: True PCI Address (Domain:Bus:Device.Function): 0000:0A:00.0 Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s) Controller Mode: RAID Mode Controller Mode Reboot: Not Required Latency Scheduler Setting: Disabled Current Power Mode: MaxPerformance Host Serial Number: CZ250305FS Sanitize Erase Supported: False Primary Boot Volume: None Secondary Boot Volume: None Port Name: 2I Port ID: 0 Port Connection Number: 0 SAS Address: 500143803366B9C0 Port Location: Internal Managed Cable Connected: False Port Name: 1I Port ID: 1 Port Connection Number: 1 SAS Address: 500143803366B9C4 Port Location: Internal Managed Cable Connected: False Internal Drive Cage at Port 1I, Box 1, OK Power Supply Status: Not Redundant Drive Bays: 4 Port: 1I Box: 1 Location: Internal Physical Drives physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK) physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK) physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK) None attached Internal Drive Cage at Port 2I, Box 0, OK Power Supply Status: Not Redundant Drive Bays: 4 Port: 2I Box: 0 Location: Internal Physical Drives None attached None attached Array: A Interface Type: Solid State SATA Unused Space: 0 MB (0.0%) Used Space: 1.8 TB (100.0%) Status: OK Array Type: Data HP SSD Smart Path: enable Logical Drive: 1 Size: 931.5 GB Fault Tolerance: 1+0 Heads: 255 Sectors Per Track: 32 Cylinders: 65535 Strip Size: 256 KB Full Stripe Size: 512 KB Status: Ready for Rebuild Caching: Disabled Unique Identifier: XXXXXXXXX Disk Name: /dev/sda Mount Points: /boot/efi 200 MB Partition Number 2, /boot 500 MB Partition Number 3 OS Status: LOCKED Logical Drive Label: 026ACA51PDNNK0ARH7Q0B9471B Mirror Group 1: Smart HBA H240 in Slot 2 physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK) Smart HBA H240 in Slot 2 physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK) Mirror Group 2: Smart HBA H240 in Slot 2 physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK) Smart HBA H240 in Slot 2 physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK) Drive Type: Data LD Acceleration Method: HP SSD Smart Path physicaldrive 1I:1:1 Port: 1I Box: 1 Bay: 1 Status: OK Drive Type: Data Drive Interface Type: Solid State SATA Size: 500 GB Drive exposed to OS: False Native Block Size: 512 Firmware Revision: EMT01B6Q Serial Number: XXXXXXXXX Model: ATA Samsung SSD 850 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 27 Maximum Temperature (C): 70 SSD Smart Trip Wearout: Not Supported PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs. Sanitize Erase Supported: False physicaldrive 1I:1:2 Port: 1I Box: 1 Bay: 2 Status: OK Drive Type: Data Drive Interface Type: Solid State SATA Size: 500 GB Drive exposed to OS: False Native Block Size: 512 Firmware Revision: EMT01B6Q Serial Number: XXXXXXXXX Model: ATA Samsung SSD 850 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 27 Maximum Temperature (C): 70 SSD Smart Trip Wearout: Not Supported PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: OK Carrier Application Version: 11 Carrier Bootloader Version: 6 Sanitize Erase Supported: False physicaldrive 1I:1:3 Port: 1I Box: 1 Bay: 3 Status: OK Drive Type: Data Drive Interface Type: Solid State SATA Size: 500 GB Drive exposed to OS: False Native Block Size: 512 Firmware Revision: EMT01B6Q Serial Number: XXXXXXXXX Model: ATA Samsung SSD 850 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 28 Maximum Temperature (C): 70 SSD Smart Trip Wearout: Not Supported PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: OK Carrier Application Version: 11 Carrier Bootloader Version: 6 Sanitize Erase Supported: False physicaldrive 1I:1:4 Port: 1I Box: 1 Bay: 4 Status: OK Drive Type: Data Drive Interface Type: Solid State SATA Size: 500 GB Drive exposed to OS: False Native Block Size: 512 Firmware Revision: EMT01B6Q Serial Number: XXXXXXXXX Model: ATA Samsung SSD 850 SATA NCQ Capable: True SATA NCQ Enabled: True Current Temperature (C): 28 Maximum Temperature (C): 70 SSD Smart Trip Wearout: Not Supported PHY Count: 1 PHY Transfer Rate: 6.0Gbps Drive Authentication Status: OK Carrier Application Version: 11 Carrier Bootloader Version: 6 Sanitize Erase Supported: False 

我不想将其作为重复closures,但应该安装HP Management Agent以提供服务器运行状况信息。 这可以通过yum或使用支持站点上列出的ProLiant DL120 Gen9和RHEL7的单个软件包来实现。

请参阅: HP ProLiant DL380e Gen8服务器 – SPP使用的一些想法…

至less,您可以使用hpssacli工具为您提供实际的RAID控制器信息。

但要明白,当您包含其他实用程序时,服务器也能够发送电子邮件,SNMP陷阱和logging健康事件。