DRBD不断饱和的链接崩溃

在高I / O的情况下,DRBD会导致服务器崩溃。 无论如何,优化DRBD防止再次发生。 下面列出的是我目前的configuration,错误和规格。 如果您需要更多的信息,请让我知道。 提前致谢。

最新的drbdconfiguration(与辅助设置相同):

[root@23 ~]# cat /etc/drbd.d/drbd0.res resource drbd0 { startup { degr-wfc-timeout 30; # default is 2 minutes. } disk { on-io-error detach; fencing dont-care; disk-barrier no; disk-flushes no; al-extents 3389; } net { max-buffers 8000; max-epoch-size 8000; sndbuf-size 512k; unplug-watermark 16; after-sb-1pri discard-secondary; } on 23 { device /dev/drbd0; disk /dev/sdb1; address 10.251.30.148:7789; flexible-meta-disk internal; } on 23-t2 { device /dev/drbd0; disk /dev/sdb1; address 10.48.25.66:7789; flexible-meta-disk internal; } } 

崩溃后出错:

 "echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message INFO: task drbd_w_drbd1:2412 blocked for more that 120 seconds "echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message INFO: task master:2506 blocked for more that 120 seconds "echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message INFO: task java:2653 blocked for more that 120 seconds "echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message INFO: task jbd2/drbd1-8:2234 blocked for more that 120 seconds "echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message INFO: task cdpserver:2380 blocked for more that 120 seconds "echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message INFO: task cdpserver:2396 blocked for more that 120 seconds "echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message INFO: task cdpserver:2409 blocked for more that 120 seconds "echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message INFO: task cdpserver:2416 blocked for more that 120 seconds "echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message BUG: soft lockup - CPU#10 stuck for 67s! [scsi_eh_6:616] BUG: soft lockup - CPU#10 stuck for 67s! [scsi_eh_6:616] aacraid: acc_fib_send: first asynshronous command timed out Usually a result of a PCI interrup routing problem" update mother board BIOS or consider utilizing one of the SAFE mode kernel option (acpi, apic etc) 

当前设置:

 CentOS release 6.3 2.6.32-279.5.2.el6.x86_64 drbd-8.4.1-1.el6.x86_64 2XE5620 12GM of mem Adaptec 5805 /dev/drbd0 15T /dev/drbd1 15T 

在这种情况下,你还没有解释什么是崩溃的手段。 在你的“崩溃后”的消息,它看起来像DRBD仍在运行。 cat /proc/drbd在事件之后cat /proc/drbd说些什么? 什么ps -ef|grep -i [d]rbd

无论如何,对我来说,看起来您的磁盘和/或存储控制器的性能不足以维持较高的IO负载,从而使系统,特别是DRBD等待时间过长,同时刷新写入磁盘。 如果是这种情况,那么这是您的硬件设置问题,而不是DRBD。 但是要确定你可能想把这个提交给DRBD邮件列表。