我们最近注意到我们的数据库查询花费了比平时更长的时间来运行。 经过一番调查,看起来我们正在读取非常慢的磁盘。
过去由于RAID控制器在BBU上启动一个重新学习周期并切换到直写,我们遇到了类似的问题。 这次看起来并不是这样。
我在几天的时间里跑了几次bonnie++ 。 结果如下:
22-82 M / s的读数似乎相当糟糕。 对原始设备运行dd几分钟,显示从15.8 MB / s到225 MB / s的读取(见下面更新)。 iotop没有指出任何其他进程竞争IO,所以我不知道为什么读取速度是如此可变。
raid卡是MegaRAID SAS 9280,带有RAID 10中的12个SAS驱动器(15k,300GB)和一个XFS文件系统(在两个configuration为RAID1的SSD上的操作系统)。 我没有看到任何SMART警报,并且arrays看起来没有降级。
我也运行了xfs_check ,看起来没有任何XFS一致性问题。
接下来的调查步骤应该是什么?
Ubuntu 12.04.5 LTS 128GB RAM Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
xfs_repair -n输出xfs_repair -n :
Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 3 - agno = 2 - agno = 0 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting.
megacli -AdpAllInfo -aAll输出megacli -AdpAllInfo -aAll :
Versions ================ Product Name : LSI MegaRAID SAS 9280-4i4e Serial No : SV24919344 FW Package Build: 12.12.0-0124 Mfg. Data ================ Mfg. Date : 12/06/12 Rework Date : 00/00/00 Revision No : 04B Battery FRU : N/A Image Versions in Flash: ================ FW Version : 2.130.363-1846 BIOS Version : 3.25.00_4.12.05.00_0x05180000 Preboot CLI Version: 04.04-020:#%00009 WebBIOS Version : 6.0-51-e_47-Rel NVDATA Version : 2.09.03-0039 Boot Block Version : 2.02.00.00-0000 BOOT Version : 09.250.01.219 Pending Images in Flash ================ None PCI Info ================ Controller Id : 0000 Vendor Id : 1000 Device Id : 0079 SubVendorId : 1000 SubDeviceId : 9282 Host Interface : PCIE ChipRevision : B4 Link Speed : 0 Number of Frontend Port: 0 Device Interface : PCIE Number of Backend Port: 8 Port : Address 0 5003048001c1e47f 1 0000000000000000 2 0000000000000000 3 0000000000000000 4 0000000000000000 5 0000000000000000 6 0000000000000000 7 0000000000000000 HW Configuration ================ SAS Address : 500605b005a6cbc0 BBU : Present Alarm : Present NVRAM : Present Serial Debugger : Present Memory : Present Flash : Present Memory Size : 512MB TPM : Absent On board Expander: Absent Upgrade Key : Absent Temperature sensor for ROC : Absent Temperature sensor for controller : Absent Settings ================ Current Time : 14:58:51 7/11, 2016 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 30% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 4 Delay Among Spinup Groups : 2s Physical Drive Coercion Mode : Disabled Cluster Mode : Disabled Alarm : Enabled Auto Rebuild : Enabled Battery Warning : Enabled Ecc Bucket Size : 15 Ecc Bucket Leak Rate : 1440 Minutes Restore HotSpare on Insertion : Disabled Expose Enclosure Devices : Enabled Maintain PD Fail History : Enabled Host Request Reordering : Enabled Auto Detect BackPlane Enabled : SGPIO/i2c SEP Load Balance Mode : Auto Use FDE Only : No Security Key Assigned : No Security Key Failed : No Security Key Not Backedup : No Default LD PowerSave Policy : Controller Defined Maximum number of direct attached drives to spin up in 1 min : 120 Auto Enhanced Import : No Any Offline VD Cache Preserved : No Allow Boot with Preserved Cache : No Disable Online Controller Reset : No PFK in NVRAM : No Use disk activity for locate : No POST delay : 90 seconds BIOS Error Handling : Stop On Errors Current Boot Mode :Normal Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Mix of SAS/SATA of HDD type in VD Allowed Status ================ ECC Bucket Count : 0 Limitations ================ Max Arms Per VD : 32 Max Spans Per VD : 8 Max Arrays : 128 Max Number of VDs : 64 Max Parallel Commands : 1008 Max SGE Count : 80 Max Data Transfer Size : 8192 sectors Max Strips PerIO : 42 Max LD per array : 16 Min Strip Size : 8 KB Max Strip Size : 1.0 MB Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 350 MB Device Present ================ Virtual Drives : 2 Degraded : 0 Offline : 0 Physical Devices : 16 Disks : 14 Critical Disks : 0 Failed Disks : 0 Supported Adapter Operations ================ Rebuild Rate : Yes CC Rate : Yes BGI Rate : Yes Reconstruct Rate : Yes Patrol Read Rate : Yes Alarm Control : Yes Cluster Support : No BBU : Yes Spanning : Yes Dedicated Hot Spare : Yes Revertible Hot Spares : Yes Foreign Config Import : Yes Self Diagnostic : Yes Allow Mixed Redundancy on Array : No Global Hot Spares : Yes Deny SCSI Passthrough : No Deny SMP Passthrough : No Deny STP Passthrough : No Support Security : No Snapshot Enabled : No Support the OCE without adding drives : Yes Support PFK : Yes Support PI : No Support Boot Time PFK Change : No Disable Online PFK Change : No PFK TrailTime Remaining : 0 days 0 hours Support Shield State : No Block SSD Write Disk Cache Change: No Supported VD Operations ================ Read Policy : Yes Write Policy : Yes IO Policy : Yes Access Policy : Yes Disk Cache Policy : Yes Reconstruction : Yes Deny Locate : No Deny CC : No Allow Ctrl Encryption: No Enable LDBBM : No Support Breakmirror : No Power Savings : No Supported PD Operations ================ Force Online : Yes Force Offline : Yes Force Rebuild : Yes Deny Force Failed : No Deny Force Good/Bad : No Deny Missing Replace : No Deny Clear : No Deny Locate : No Support Temperature : Yes NCQ : No Disable Copyback : No Enable JBOD : No Enable Copyback on SMART : No Enable Copyback to SSD on SMART Error : Yes Enable SSD Patrol Read : No PR Correct Unconfigured Areas : Yes Enable Spin Down of UnConfigured Drives : Yes Disable Spin Down of hot spares : No Spin Down time : 30 T10 Power State : No Error Counters ================ Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 Cluster Information ================ Cluster Permitted : No Cluster Active : No Default Settings ================ Phy Polarity : 0 Phy PolaritySplit : 0 Background Rate : 30 Strip Size : 256kB Flush Time : 4 seconds Write Policy : WB Read Policy : Adaptive Cache When BBU Bad : Disabled Cached IO : No SMART Mode : Mode 6 Alarm Disable : Yes Coercion Mode : None ZCR Config : Unknown Dirty LED Shows Drive Activity : No BIOS Continue on Error : 0 Spin Down Mode : None Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : Yes Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Allow SATA in Cluster : No Max Chained Enclosures : 16 Disable Ctrl-R : Yes Enable Web BIOS : Yes Direct PD Mapping : No BIOS Enumerate VDs : Yes Restore Hot Spare on Insertion : No Expose Enclosure Devices : Yes Maintain PD Fail History : Yes Disable Puncturing : No Zero Based Enclosure Enumeration : No PreBoot CLI Enabled : Yes LED Show Drive Activity : Yes Cluster Disable : Yes SAS Disable : No Auto Detect BackPlane Enable : SGPIO/i2c SEP Use FDE Only : No Enable Led Header : No Delay during POST : 0 EnableCrashDump : No Disable Online Controller Reset : No EnableLDBBM : No Un-Certified Hard Disk Drives : Allow Treat Single span R1E as R10 : No Max LD per array : 16 Power Saving option : Don't Auto spin down Configured Drives Max power savings option is not allowed for LDs. Only T10 power conditions are to be used. Default spin down time in minutes: 30 Enable JBOD : No TTY Log In Flash : No Auto Enhanced Import : No BreakMirror RAID Support : No Disable Join Mirror : No Enable Shield State : No Time taken to detect CME : 60s
megacli -AdpBbuCmd -GetBbuSTatus -aAll输出megacli -AdpBbuCmd -GetBbuSTatus -aAll :
BBU status for Adapter: 0 BatteryType: iBBU Voltage: 4068 mV Current: 0 mA Temperature: 30 C Battery State: Optimal BBU Firmware Status: Charging Status : Charging Voltage : OK Temperature : OK Learn Cycle Requested : No Learn Cycle Active : No Learn Cycle Status : OK Learn Cycle Timeout : No I2c Errors Detected : No Battery Pack Missing : No Battery Replacement required : No Remaining Capacity Low : No Periodic Learn Required : No Transparent Learn : No No space to cache offload : No Pack is about to fail & should be replaced : No Cache Offload premium feature required : No Module microcode update required : No GasGuageStatus: Fully Discharged : No Fully Charged : No Discharging : Yes Initialized : Yes Remaining Time Alarm : No Discharge Terminated : No Over Temperature : No Charging Terminated : No Over Charged : No Relative State of Charge: 88 % Charger System State: 49169 Charger System Ctrl: 0 Charging current: 512 mA Absolute state of charge: 87 % Max Error: 4 % Exit Code: 0x00
megacli -LDInfo -Lall -aAll输出megacli -LDInfo -Lall -aAll :
Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 111.281 GB Sector Size : 512 Mirror Data : 111.281 GB State : Optimal Strip Size : 256 KB Number Of Drives : 2 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Is VD Cached: No Virtual Drive: 1 (Target Id: 1) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 1.633 TB Sector Size : 512 Mirror Data : 1.633 TB State : Optimal Strip Size : 256 KB Number Of Drives per span:2 Span Depth : 6 Default Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Is VD Cached: No
更新:根据安德鲁的build议,我跑了几分钟,看看什么样的吞吐量,我会得到原始磁盘读取:
dd if=/dev/sdb of=/dev/null bs=256k 19701+0 records in 19700+0 records out 5164236800 bytes (5.2 GB) copied, 202.553 s, 25.5 MB/s
其他运行的结果,吞吐量变化很大:
18706857984 bytes (19 GB) copied, 1181.51 s, 15.8 MB/s 20923023360 bytes (21 GB) copied, 388.137 s, 53.9 MB/s 21205876736 bytes (21 GB) copied, 55.5997 s, 381 MB/s 25391005696 bytes (25 GB) copied, 153.903 s, 165 MB/s
更新2:输出megacli -PDlist -aall : https : megacli -PDlist -aall
正如米哈尔在他的评论中指出的那样,这个问题是一个前置盘。 megaraid控制器的诊断程序中没有红旗,但是在每个磁盘上运行smartctl显示一个巨大的“非中等错误”计数(我写了一个快速bash脚本来遍历每个磁盘ID)。 以下是全部输出的相关位:
# Ran this for each individual disk on the /dev/sdb array: smartctl -a -d megaraid,18 /dev/sdb Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 7950078 0 0 7950078 7950078 660.801 0 write: 0 0 0 0 0 363.247 0 verify: 12 0 0 12 12 0.002 0 Non-medium error count: 3253718
除了这个(磁盘ID 18)之外,其他驱动器显示非介质错误计数为0。 我find了磁盘,换了一个新的,回到3GB的读取。
您需要validation驱动器的碎片:
xfs_db -r /dev/sdbx frag
你会有这样的答案:
actual 347954, ideal 15723, fragmentation factor 95.48%
如果碎片因素很高,则需要整理磁盘碎片。 (是的,我知道,就像在Windows上一样):/
整理磁盘: xfs_fsr -v /dev/sdbx
有了LSI,有一些非常重要的东西。
1)刷新RAID固件。 你现在有几分钟的时间了。
2)闪存驱动器上的固件,并确保它的更新。
3)更新你的驱动程序。 根据LSI网站上的发行说明,他们在1月底刚刚发布了一个新的驱动程序。
然后你可以重新运行你的testing,看看是否有任何改变。