我们有一个Hadoop集群,其中任意数据节点将被locking。 这通常是由不断增加的负载平均值来预先设定的,CPU和IOwait实际上是不存在的。 受影响的机器的用例是高IO hadoop数据节点,其中有大量的非目标大型档案,并且写入许多小型和大型文件。 底层磁盘正在运行内核2.6.32-358.18.1.el6.x86_64的XFS。 机器都具有8GB以上的32GB + RAM
设备型号是Dell R720xd
Raidconfiguration是:
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PdList -aAll Adapter #0 Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5008e1f239d SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600957SS ESF76SLAH2NQ FDE Capable: Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 1 Device Id: 1 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e7b6bd1 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5J0NV FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 2 Device Id: 2 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e783fa9 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5FE47 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 3 Device Id: 3 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e7b6ea9 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5J0W4 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 4 Device Id: 4 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e78e8cd SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5HPC9 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 5 Device Id: 5 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e7b6e51 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5GFW2 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 6 Device Id: 6 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e7b6ef5 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5J0GC FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 7 Device Id: 7 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e78e991 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5GG86 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 8 Device Id: 8 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c50095a39799 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SLAQM3Y FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 9 Device Id: 9 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e78e7b1 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5HP5A FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 10 Device Id: 10 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e7b6ce5 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5J0MW FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 11 Device Id: 11 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c5005e78e269 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL5HP7Y FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device Exit Code: 0x00
RAID虚拟驱动器configuration是:
sudo /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aAll Adapter 0 -- Virtual Drive Information: Virtual Disk: 0 (Target Id: 0) Name:OS RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:558.375 GB State: Optimal Stripe Size: 64 KB Number Of Drives:2 Span Depth:1 Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disabled Encryption Type: None Virtual Disk: 1 (Target Id: 1) Name: RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3 Size:4.362 TB State: Optimal Stripe Size: 64 KB Number Of Drives:10 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Encryption Type: None Exit Code: 0x00
iostat -x的输出
[[email protected] ~]$ iostat -x Linux 2.6.32-358.18.1.el6.x86_64 (data1234.svx.foo.bar) 02/17/2016 _x86_64_ (32 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 17.72 0.00 3.54 0.10 0.00 78.65 Device: rrqm/s wrqm/sr/sw/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.31 27.97 0.49 3.35 18.59 250.38 69.96 0.01 2.26 0.31 0.12 sdb 0.00 1.51 26.10 47.14 4989.96 15418.12 278.65 2.58 35.25 0.50 3.64
/ etc / fstab的内容
UUID=4fe41c9b-f3f1-4c36-99a2-30e2af5c75e1 / ext3 defaults 1 1 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 /dev/sdb /data xfs defaults,noatime,nodiratime,logbufs=8,nobarrier 1 2 /data/home /home none bind 0 0
xfs_info的输出
xfs_info /dev/sdb meta-data=/dev/sdb isize=256 agcount=32, agsize=36593648 blks = sectsz=512 attr=2, projid32bit=0 data = bsize=4096 blocks=1170996736, imaxpct=5 = sunit=16 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
dmesg的输出
INFO: task swh-logfiles_pr:22324 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swh-logfiles_ D 0000000000000000 0 22324 22300 0x00000000 ffff881fe29cdd38 0000000000000086 ffff881fe29cdc98 ffffffff8109f641 ffff881fe29cdcc8 ffffffff8118e05d ffff881fe29cdcc8 ffff881c2e78300a ffff881ded459ab8 ffff881fe29cdfd8 000000000000fb88 ffff881ded459ab8 Call Trace: [<ffffffff8109f641>] ? in_group_p+0x31/0x40 [<ffffffff8118e05d>] ? acl_permission_check+0x5d/0xc0 [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff81192e67>] do_filp_open+0x2d7/0xdc0 [<ffffffff8118f541>] ? path_put+0x31/0x40 [<ffffffff8119f922>] ? alloc_fd+0x92/0x160 [<ffffffff8117e249>] do_sys_open+0x69/0x140 [<ffffffff8117e360>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task swh-logfiles_pr:22345 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swh-logfiles_ D 0000000000000001 0 22345 22323 0x00000000 ffff88201044fd38 0000000000000086 0000000000000000 ffffffff8109f641 ffff88201044fcc8 ffffffff8118e05d ffff88201044fcc8 ffff881fc7a1500a ffff8819d03fe638 ffff88201044ffd8 000000000000fb88 ffff8819d03fe638 Call Trace: [<ffffffff8109f641>] ? in_group_p+0x31/0x40 [<ffffffff8118e05d>] ? acl_permission_check+0x5d/0xc0 [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff81192e67>] do_filp_open+0x2d7/0xdc0 [<ffffffff811b3ffb>] ? vfs_statfs+0x1b/0xb0 [<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110 [<ffffffff8119f922>] ? alloc_fd+0x92/0x160 [<ffffffff8117e249>] do_sys_open+0x69/0x140 [<ffffffff8117e360>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task swh-logfiles_pr:22356 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swh-logfiles_ D 0000000000000001 0 22356 22334 0x00000000 ffff881cc4f8f698 0000000000000086 ffff881cc4f8f85c ffff880e59395038 ffff881cc4f8f6a8 ffffffffa01a670d ffff881cc4f8f908 0000000000000000 ffff881fdf067ab8 ffff881cc4f8ffd8 000000000000fb88 ffff881fdf067ab8 Call Trace: [<ffffffffa01a670d>] ? xfs_bmap_add_extent+0xad/0x3c0 [xfs] [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0 [<ffffffffa01a7562>] ? xfs_bmapi+0xb42/0x1120 [xfs] [<ffffffff8150fec2>] __down+0x72/0xb0 [<ffffffffa01e78e5>] ? _xfs_buf_find+0xe5/0x230 [xfs] [<ffffffff8109cb61>] down+0x41/0x50 [<ffffffffa01e7751>] xfs_buf_lock+0x51/0x100 [xfs] [<ffffffffa01e78e5>] _xfs_buf_find+0xe5/0x230 [xfs] [<ffffffffa01e7a64>] xfs_buf_get+0x34/0x1b0 [xfs] [<ffffffffa01e80ec>] xfs_buf_read+0x2c/0x100 [xfs] [<ffffffffa01dd9a7>] xfs_trans_read_buf+0x1f7/0x410 [xfs] [<ffffffffa01c0404>] xfs_read_agi+0x74/0x100 [xfs] [<ffffffffa01c04be>] xfs_ialloc_read_agi+0x2e/0x90 [xfs] [<ffffffffa01c07a3>] xfs_ialloc_ag_select+0x133/0x270 [xfs] [<ffffffffa01c1e67>] xfs_dialloc+0x3d7/0x850 [xfs] [<ffffffffa01e6e25>] ? xfs_buf_rele+0x55/0x100 [xfs] [<ffffffffa01ddf98>] ? xfs_trans_brelse+0xe8/0x130 [xfs] [<ffffffffa01b029b>] ? xfs_da_brelse+0x7b/0xc0 [xfs] [<ffffffffa01c5ba0>] xfs_ialloc+0x60/0x6e0 [xfs] [<ffffffffa01e2eaa>] ? kmem_zone_zalloc+0x3a/0x50 [xfs] [<ffffffffa01de534>] xfs_dir_ialloc+0x74/0x2b0 [xfs] [<ffffffffa01e0610>] xfs_create+0x440/0x640 [xfs] [<ffffffffa01ed7bd>] xfs_vn_mknod+0xad/0x1c0 [xfs] [<ffffffffa01ed900>] xfs_vn_create+0x10/0x20 [xfs] [<ffffffff8118fbd4>] vfs_create+0xb4/0xe0 [<ffffffff811936a0>] do_filp_open+0xb10/0xdc0 [<ffffffff8118f541>] ? path_put+0x31/0x40 [<ffffffff8119f922>] ? alloc_fd+0x92/0x160 [<ffffffff8117e249>] do_sys_open+0x69/0x140 [<ffffffff8117e360>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task swh-logfiles_pr:22386 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swh-logfiles_ D 0000000000000001 0 22386 22362 0x00000000 ffff88200be6dd38 0000000000000082 ffff88200be6dc98 ffffffff8109f641 ffff88200be6dcc8 ffffffff8118e05d ffff88200be6dcc8 ffff881fd395800a ffff881fce825af8 ffff88200be6dfd8 000000000000fb88 ffff881fce825af8 Call Trace: [<ffffffff8109f641>] ? in_group_p+0x31/0x40 [<ffffffff8118e05d>] ? acl_permission_check+0x5d/0xc0 [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff81192e67>] do_filp_open+0x2d7/0xdc0 [<ffffffff8118f541>] ? path_put+0x31/0x40 [<ffffffff8119f922>] ? alloc_fd+0x92/0x160 [<ffffffff8117e249>] do_sys_open+0x69/0x140 [<ffffffff8117e360>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task swh-logfiles_pr:22415 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swh-logfiles_ D 0000000000000000 0 22415 22402 0x00000000 ffff881cd8f6dd38 0000000000000086 0000000000000000 ffffffff8109f641 ffff881cd8f6dcc8 ffffffff8118e05d ffff881cd8f6dcc8 ffff881f2073500a ffff881fd367c5f8 ffff881cd8f6dfd8 000000000000fb88 ffff881fd367c5f8 Call Trace: [<ffffffff8109f641>] ? in_group_p+0x31/0x40 [<ffffffff8118e05d>] ? acl_permission_check+0x5d/0xc0 [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff81192e67>] do_filp_open+0x2d7/0xdc0 [<ffffffff811b3ffb>] ? vfs_statfs+0x1b/0xb0 [<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110 [<ffffffff8119f922>] ? alloc_fd+0x92/0x160 [<ffffffff8117e249>] do_sys_open+0x69/0x140 [<ffffffff8117e360>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task flush-8:16:5856 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-8:16 D 000000000000000b 0 5856 2 0x00000000 ffff881fd151b798 0000000000000046 0000000000000000 ffff8820129af380 0000000000000086 ffff881fd151b720 ffff88200b648ea8 0000000000000001 ffff881fda34f058 ffff881fd151bfd8 000000000000fb88 ffff881fda34f058 Call Trace: [<ffffffff8125ea61>] ? blk_queue_bio+0x121/0x5d0 [<ffffffff81510695>] rwsem_down_failed_common+0x95/0x1d0 [<ffffffff81510826>] rwsem_down_read_failed+0x26/0x30 [<ffffffff81283844>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff8150fd24>] ? down_read+0x24/0x30 [<ffffffffa01c29cd>] xfs_ilock+0x9d/0xd0 [xfs] [<ffffffffa01e491b>] xfs_map_blocks+0x1fb/0x250 [xfs] [<ffffffffa01e4a83>] ? xfs_submit_ioend_bio+0x33/0x40 [xfs] [<ffffffffa01e5401>] xfs_vm_writepage+0x261/0x5a0 [xfs] [<ffffffff811198c0>] ? find_get_pages_tag+0x40/0x130 [<ffffffff8112cbb7>] __writepage+0x17/0x40 [<ffffffff8112de6d>] write_cache_pages+0x1fd/0x4c0 [<ffffffff8112cba0>] ? __writepage+0x0/0x40 [<ffffffff8112e154>] generic_writepages+0x24/0x30 [<ffffffffa01e46dd>] xfs_vm_writepages+0x5d/0x80 [xfs] [<ffffffff8112e181>] do_writepages+0x21/0x40 [<ffffffff811aca0d>] writeback_single_inode+0xdd/0x290 [<ffffffff811ace1e>] writeback_sb_inodes+0xce/0x180 [<ffffffff811acf7b>] writeback_inodes_wb+0xab/0x1b0 [<ffffffff811ad31b>] wb_writeback+0x29b/0x3f0 [<ffffffff8150e130>] ? thread_return+0x4e/0x76e [<ffffffff81081be2>] ? del_timer_sync+0x22/0x30 [<ffffffff811ad615>] wb_do_writeback+0x1a5/0x240 [<ffffffff811ad713>] bdi_writeback_task+0x63/0x1b0 [<ffffffff81096c67>] ? bit_waitqueue+0x17/0xd0 [<ffffffff8113cc20>] ? bdi_start_fn+0x0/0x100 [<ffffffff8113cca6>] bdi_start_fn+0x86/0x100 [<ffffffff8113cc20>] ? bdi_start_fn+0x0/0x100 [<ffffffff81096a36>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffff810969a0>] ? kthread+0x0/0xa0 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 INFO: task java:1114 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D 0000000000000006 0 1114 31588 0x00000000 ffff881c5bba7dd8 0000000000000086 0000000000000000 0000000000000001 ffff881c5bba7d58 ffff881d6ccc9500 ffff881d6ccc9500 ffff881d6ccc9500 ffff881d6ccc9ab8 ffff881c5bba7fd8 000000000000fb88 ffff881d6ccc9ab8 Call Trace: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff8118ebb0>] lookup_create+0x30/0xd0 [<ffffffff811924ac>] sys_mkdirat+0x7c/0x130 [<ffffffff81186f36>] ? sys_newstat+0x36/0x50 [<ffffffff81192578>] sys_mkdir+0x18/0x20 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task java:803 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D 0000000000000004 0 803 31612 0x00000000 ffff881c2e7a1dd8 0000000000000082 0000000000000000 0000000000000001 ffff881c2e7a1d58 ffff881fe5494ae0 ffff881fe5494ae0 ffff881fe5494ae0 ffff881fe5495098 ffff881c2e7a1fd8 000000000000fb88 ffff881fe5495098 Call Trace: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff8118ebb0>] lookup_create+0x30/0xd0 [<ffffffff811924ac>] sys_mkdirat+0x7c/0x130 [<ffffffff81186f36>] ? sys_newstat+0x36/0x50 [<ffffffff81192578>] sys_mkdir+0x18/0x20 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task java:1171 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D 0000000000000000 0 1171 31636 0x00000000 ffff881961ce9dd8 0000000000000086 0000000000000000 0000000000000001 ffff881961ce9d58 ffff881cc26f3540 ffff881cc26f3540 ffff881cc26f3540 ffff881cc26f3af8 ffff881961ce9fd8 000000000000fb88 ffff881cc26f3af8 Call Trace: [<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110 [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff8118ebb0>] lookup_create+0x30/0xd0 [<ffffffff811924ac>] sys_mkdirat+0x7c/0x130 [<ffffffff81186f36>] ? sys_newstat+0x36/0x50 [<ffffffff81192578>] sys_mkdir+0x18/0x20 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task java:950 blocked for more than 180 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D 0000000000000002 0 950 31666 0x00000000 ffff88200d42bdd8 0000000000000082 0000000000000000 0000000000000001 ffff88200d42bd58 ffff881cccccc040 ffff881cccccc040 ffff881cccccc040 ffff881cccccc5f8 ffff88200d42bfd8 000000000000fb88 ffff881cccccc5f8 Call Trace: [<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110 [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 [<ffffffff8118ebb0>] lookup_create+0x30/0xd0 [<ffffffff811924ac>] sys_mkdirat+0x7c/0x130 [<ffffffff81186f36>] ? sys_newstat+0x36/0x50 [<ffffffff81192578>] sys_mkdir+0x18/0x20 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
正如你的内核日志所说 – 你在文件系统或以下的级别上遇到了问题。 坏事 – 硬件是好的。 对于目前的负载似乎已经足够了。
以我的经验,尽pipeXFS被推荐为一个可扩展的文件系统,但使用它会给你带来比性能更多的麻烦。 但是,如果迁移到EXT4不是您的select,您可以尝试调整后自行承担风险:
#增加请求数量: echo 4096> / sys / block / sdb / queue / nr_requests #使用积极的装载选项: mount -oremount,noatime,nodiratime,logbufs = 8,logbsize = 256k,largeio,inode64,swalloc,allocsize = 131072k,nobarrier / dev / sdb / data
此外,你可以尝试重新安装/数据目录与默认选项,看看是否问题仍然存在。