我试图监视硬盘的智能状态。 我试过smartctl工具和惠普自己的hpacucli来生成ADU报告。 但是,它们都不是有用的。 smartctl不显示开机时间或硬盘温度等数值。 ADU报告显示他们是空的。
用HP RAID控制器监视硬盘的正确方法是什么?
Smartctl: smartctl -a -d cciss,0 / dev / sg0
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-20-pve] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net User Capacity: 3,000,592,982,016 bytes [3.00 TB] Logical block size: 512 bytes Logical Unit id: 0x5000c5003f11a168 Serial number: XXXXXXX Device type: disk Local Time is: Sun Jul 14 22:42:08 2013 HADT Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: <not available> Read defect list: asked for grown list but didn't get it Error Counter logging not supported Device does not support Self Test logging
Hpacucli: hpacucli ctrl所有诊断文件= / usr / monitor / report.zip ris = on xml = on zip = on
Smart Array P410 in slot 1 : Internal Drive Cage at Port 1I : Box 1 : Drive Cage on Port 1I : Physical Drive (3 TB SATA) 1I:1:12 : Monitor and Performance Statistics (Since Reset) Serial Number XXXXXXXX Firmware Revision 0003 Product Revision ATA ST33000650NS Reference Time 0x00001715 Sectors Read 0x00000000f868ca8b Read Errors Hard 0x00000000 Read Errors Retry Recovered 0x00000000 Read Errors ECC Corrected 0x0000000000000000 Sectors Written 0x0000000016dd925d Write Errors Hard 0x00000000 Write Errors Retry Recovered 0x00000000 Seek Count 0x0000000000000000 Seek Errors 0x0000000000000000 Spin Cycles 0x00000000 Spin Up Time 0x0000 Performance Test 1 0x0000 Performance Test 2 0x0000 Performance Test 3 0x0000 Performance Test 4 0x0000 Reallocation Sectors 0xffffffff Reallocated Sectors 0x00000000 DRQ Time Outs 0x0000 Other Time Outs 0x0000 Drive Rebuild Count 0 (0x0000) Spin Retries 65535 (0xffff) Recovers Failed Read 0x0000 Recovers Failed Write 0x0000 Format Errors 0x0000 Self Test Failures 0x0000 Not Ready Failures 0x00000000 Remap Abort Failures 0x00000000 IRQ Deglitch Count 0 (0x00000000) Bus Faults 0x00000003 Hot Plug Count 0 (0x00000000) Track Rewrite Errors 0xffff Write Errors After Remap 0x0000 Background Firmware Revision 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Media Failures 0x0000 Hardware Errors 0x0000 Aborted Command Failures 0x0000 Spin Up Failures 0x0000 Bad Target Count 0 (0x0000) Predictive Failure Errors 0x00000000
你不应该直接用这些控制器来使用smartctl之类的东西。 HP Smart Array控制器使用各种技术来确定驱动器和系统运行状况。 聪明就是其中之一,但不是最终的决心。 利用可用的专用工具可以感觉到这一点。
所以在你的情况下,configuration你的hp-snmp-agents在发生问题时发送邮件。 在Linux上,电子邮件默认为root用户,系统日志中显示一条消息,但configuration警报目标可以在/opt/hp/hp-snmp-agents/cma.conf处理。
至于hpacucli实用程序,运行hpacucli ctrl all show config detail提供了大部分相关的arrays健康信息。
对我来说就是这样
smartctl -d cciss,0 -a /dev/cciss/c0d0 Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 31 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 3203138637 Blocks received from initiator = 3715997197 Blocks read from cache and sent to initiator = 484569203 Number of read and write commands whose size <= segment size = 1111593814 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 14706.28 number of minutes until next internal SMART test = 33