服务器突然停止响应,然后在一小时后恢复

我的FreeBSD服务器已经工作了2年,没有对系统进行任何重大的改变。 最近我使用Apache的mod_ssl安装了SSL证书,经过10天的运行,服务器突然开始崩溃。

当服务器崩溃时:

  • HTTPS和SSH立即变得无法响应
  • 在停止响应之前,PING减慢到数千毫秒

15-60分钟后无法访问:

  • 服务器突然恢复,开始全速工作 – 没有发生任何事情
  • 然后在15-60分钟内再次崩溃,循环重复

我检查了什么:

  • 当我重新启动服务器时,没有任何变化 – 它仍然无法访问
  • CPU / RAM /硬盘使用情况 – 确定(<50%,包括高峰时间)
  • 交通没有任何影响 – 发生在一天中的任何时间,包括凌晨4点
  • 禁用防火墙没有帮助

在httpd-error.log中我发现:

[notice] Digest: generating secret for digest authentication ... [notice] Digest: done [notice] Apache/2.2.23 (FreeBSD) mod_ssl/2.2.23 OpenSSL/0.9.8q DAV/2 configured -- resuming normal operations [error] server reached MaxClients setting, consider raising the MaxClients setting 

我试图启用KeepAlive和大幅(4倍)增加MaxClients大小,但是这并没有解决问题:

 Timeout 120 KeepAlive On KeepAliveTimeout 5 MaxKeepAliveRequests 1000 <IfModule mpm_prefork_module> StartServers 50 MinSpareServers 128 MaxSpareServers 1024 ServerLimit 1024 MaxClients 1024 MaxRequestsPerChild 1000 </IfModule> 

在第一次崩溃之前的/ var / log / messages中,我发现:

 kernel: mfi0: 228755 (454057919s/0x0008/FATAL) - Battery needs replacement - SOH Bad kernel: mfi0: 228756 (454057984s/0x0008/FATAL) - Battery needs replacement - SOH Bad kernel: mfi0: 228757 (454058049s/0x0008/FATAL) - Battery needs replacement - SOH Bad kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 kernel: mfi0: 228758 (454058114s/0x0008/FATAL) - Battery needs replacement - SOH Bad kernel: mfi0: 228759 (454058179s/0x0008/FATAL) - Battery needs replacement - SOH Bad 

“电池需要更换”警告在第一次重新启动后消失,但arp消息以大约相同的时间间隔出现在日志中,服务器崩溃:

 May 23 05:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0 May 23 05:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:25:90:02:08:fc on ix0 May 23 05:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 05:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 05:32:44 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0 May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0 May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 05:52:40 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0 May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 06:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0 May 23 06:20:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:03 on ix0 May 23 06:20:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0 May 23 06:30:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:25:90:02:08:fc on ix0 May 23 06:32:36 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0 May 23 06:50:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 06:50:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 07:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0 May 23 07:12:28 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 May 23 07:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0 May 23 07:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0 

下一步该怎么办才能find并解决问题?

现在你应该做的最后一件事是增加MaxClients。

这很难说。 减速和MaxClients警告表明,你对服务器的需求太多了。 除非你在服务器上运行了很多AJAX / COMET的东西,那么你真的应该减lesskeepalive超时(例如,最初2)。

“电池需要更换”不仅仅是一个提醒做一些维护 – 在BBWC这意味着控制器不再试图caching写入 – 如果您的系统设置正确,那么您的操作系统和磁盘将不会caching写入无论是。

两者都表明你的系统的performance应该是非常糟糕的 – 但是你首先报告的是它已经不可用 – 实际上你没有提到性能 – 知道如何衡量性能和捕获数据应该在你的议程上。

我不知道为什么地址继续移动(我假设这些是本地接口) – 这可能是其他地方的负载的结果。

这是一个生病的小狗 – 你将不得不一次开始修理一件东西,直到你清楚地知道发生了什么事情。

从切换电池开始,调整apache安装并logging性能指标。