几个小时后,清漆停在80号港口

我已经configuration清漆听80端口和Nginx听8080.大约24小时的正常运行时间后,我的网站已经下降了22个小时。 我检查了一下,发现清漆不在80端口上。

当网站启动时:

abc@abc:~$ sudo netstat -anp --tcp --udp | grep LISTEN tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 571/varnishd tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN 376/nginx tcp 0 0 0.0.0.0:9171 0.0.0.0:* LISTEN 376/nginx tcp 0 0 publicip:6082 0.0.0.0:* LISTEN 569/varnishd tcp6 0 0 :::80 :::* LISTEN 376/nginx tcp6 0 0 ::1:6082 :::* LISTEN 569/varnishd 

当网站停工时:

 abc@abc:~$ sudo netstat -anp --tcp --udp | grep LISTEN tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN 376/nginx tcp 0 0 0.0.0.0:9171 0.0.0.0:* LISTEN 376/nginx tcp 0 0 publicip:6082 0.0.0.0:* LISTEN 745/varnishd tcp6 0 0 :::80 :::* LISTEN 376/nginx tcp6 0 0 ::1:6082 :::* LISTEN 745/varnishd 

这是我的/ etc / default / varnish:

 ## Alternative 2, Configuration with VCL # # Listen on port 6081, administration on localhost:6082, and forward to # one content server selected by the vcl file, based on the request. Use a 1GB # fixed-size cache file. # DAEMON_OPTS="-a :80 \ -T localhost:6082 \ -f /etc/varnish/default.vcl \ -S /etc/varnish/secret \ -s malloc,96m" 

在第二种情况下,清漆在80号港口没有听到的具体原因是什么? 我可能只是检查,如果清漆不起来,重新启动,但这仍然意味着几分钟的停机时间。

我的varnish.vcl文件: http ://pastebin.com/UH2c8KdH我在Ubuntu的12.04 x86

2个小时左右后又发生了,这就是我从系统日志中发现的。

 Feb 14 18:16:00 abc varnishd[745]: Child (749) not responding to CLI, killing it. Feb 14 18:16:51 abc varnishd[745]: Child (749) not responding to CLI, killing it. Feb 14 18:17:49 abc varnishd[745]: Child (749) not responding to CLI, killing it. Feb 14 18:18:06 abc varnishd[745]: Child (749) not responding to CLI, killing it. Feb 14 18:19:33 abc varnishd[745]: Child (749) not responding to CLI, killing it. Feb 14 18:21:25 abc varnishd[745]: Child (749) not responding to CLI, killing it. Feb 14 18:22:34 abc varnishd[745]: Child (749) not responding to CLI, killing it. Feb 14 18:28:28 abc varnishd[745]: Child (749) not responding to CLI, killing it. Feb 14 18:29:41 abc varnishd[745]: Child (749) not responding to CLI, killing it. Feb 14 18:29:48 abc last message repeated 2 times Feb 14 18:29:48 abc varnishd[745]: Child (749) died signal=3 Feb 14 18:29:49 abc varnishd[745]: Child cleanup complete Feb 14 18:29:55 abc varnishd[745]: child (1380) Started Feb 14 18:29:58 abc varnishd[745]: Pushing vcls failed: CLI communication error (hdr) Feb 14 18:29:58 abc varnishd[745]: Stopping Child Feb 14 18:29:58 abc varnishd[745]: Child (1380) said Child starts Feb 14 18:29:59 abc varnishd[745]: Child (1380) said Child dies Feb 14 18:30:02 abc varnishd[745]: Child (1380) died status=1 Feb 14 18:30:04 abc varnishd[745]: Child cleanup complete 

我不知道为什么进程id不同于我之前发布的。 也许我在解决问题时重新启动了它。 我不能从这些日志中得到很多。 任何帮助表示赞赏。

添加更多日志:

来自/etc/log/messages详细/etc/log/messages

第一次停止:

 Feb 13 17:40:44 dragon75 varnishd[581]: Child (583) died signal=3 Feb 13 17:41:09 dragon75 varnishd[581]: child (2682) Started Feb 13 17:42:31 dragon75 varnishd[581]: Child (2682) said Child starts Feb 13 17:51:48 dragon75 varnishd[581]: Child (2682) died status=1 Feb 13 17:51:48 dragon75 varnishd[581]: Child (-1) said Child dies 

第二次停止:

 Feb 14 18:29:48 dragon75 varnishd[745]: Child (749) died signal=3 Feb 14 18:29:55 dragon75 varnishd[745]: child (1380) Started Feb 14 18:29:58 dragon75 varnishd[745]: Child (1380) said Child starts Feb 14 18:29:59 dragon75 varnishd[745]: Child (1380) said Child dies Feb 14 18:30:02 dragon75 varnishd[745]: Child (1380) died status=1 

根据消息,在16:31清漆开始,然后在/ var / log / messages中有5个MARK消息,然后在18:29清漆子消息消息。 两者之间没有任何关系。

我不认为资源是一个瓶颈。 这是一个新的网站,仍处于testing阶段。 我没有真的把它放在上面。 除了正在运行的脚本之外,没有任何stream量,只有在另一台服务器上才会检查主页。 这是我第一次使用清漆。

将光油中的cli_timeout参数增加到60秒。

这将控制监视父母等待孩子对健康检查作出响应的时间。 如果操作系统正忙于将数据分页到/从磁盘,则10s的默认值可能会降低。 将其增加到一分钟(默认从4.0开始),看看问题是否消失。

如果这没有帮助,我的下一个猜测将是过度渴望的日志旋转脚本杀死超过他们应该。