Monit:如何最佳地监控一个URL

我的networking服务器用php5-fpm运行nginx。 如果出现问题,通常php5-fpm会挂起,导致服务器出现“错误的网关”错误。 当然,我从来不知道,如果Nginx可能会崩溃,有一天。

当发生什么事情时,两个进程(通常是线程)通常都存在,需要重启。 我对当前问题的原因不是很感兴趣,但是想要重启这两个进程。 为此,我创build了两个bash脚本/etc/monit/webserver.start.sh和/etc/monit/webserver.stop.sh。

这是我的监控configuration文件(在conf.d中):

check process webserver with pidfile /var/run/nginx.pid start program = "/etc/monit/webserver.start.sh" stop program = "/etc/monit/webserver.stop.sh" if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds) then alert if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds) for 2 cycles then restart if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds) for 4 cycles then exec "/sbin/reboot" 

这并不完全错误,但仍有一些问题:

  1. 实际上,我不想监视这里的nginx进程,而是监视ports / URL。 我可以使用任何其他支票而不是check process吗?
  2. 1次失败后执行不同的操作,2次失败,4次失败, if failed条件,我需要三个,导致三个服务器请求。 有没有办法在每个循环运行一个请求,并在不同数量的失败之后执行不同的活动?

我试图从官方的monit参考find答案,但显然,我不明白这个来源所描述的可能性。 因此,我会非常赞成一些build议。

更新

我花了一些时间与monit手册页(它的结构比在线手册更好,在我看来),我发现这个优化:

 CHECK HOST webserver WITH ADDRESS 127.0.0.1 START PROGRAM = "/etc/monit/webserver.start.sh" STOP PROGRAM = "/etc/monit/webserver.stop.sh" IF NOT EXIST THEN ALERT IF FAILED (url https://www.mydomain.tld/example/ and content == 'test content' and timeout 20 seconds) FOR 2 CYCLES THEN RESTART IF 2 RESTARTS WITHIN 5 CYCLES THEN EXEC "/sbin/reboot" 

此修改不包括第一个URL失败的警报(解决方法是在这里使用虚拟启动/停止命令),但可以在2失败并重新启动失败4后失败 – 只有一个服务器请求。

这还不完美。 如果有人知道如何做得更好,build议是斯蒂尔赞赏:)谢谢!

更新

经过一些testing后,我build议使用monit的超时function( IF 2 REsTARTS WITHIN... )进行二阶操作。 看起来,超时重新启动后,在特定情况下重新运行。 在我的情况下,这导致多重启动:

 [CET Dec 28 05:59:50] error : skipping queued event /var/monit/id - unknown data format [CET Dec 28 05:59:50] error : skipping queued event /var/monit/state - unknown data format [CET Dec 30 03:10:52] error : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable [CET Jan 1 03:08:10] error : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable [CET Jan 1 03:09:30] error : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable [CET Jan 1 03:09:31] info : 'webserver' trying to restart [CET Jan 1 03:09:31] info : 'webserver' stop: /etc/monit/webserver.stop.sh [CET Jan 1 03:09:31] info : 'webserver' start: /etc/monit/webserver.start.sh [CET Jan 1 03:10:31] error : 'webserver' failed, cannot open a connection to INET[www.myserver.com/example/] via TCPSSL [CET Jan 1 03:10:31] info : 'webserver' trying to restart [CET Jan 1 03:10:31] info : 'webserver' stop: /etc/monit/webserver.stop.sh [CET Jan 1 03:10:31] info : 'webserver' start: /etc/monit/webserver.start.sh [CET Jan 1 03:10:31] error : 'php-fpm' process is not running [CET Jan 1 03:10:31] info : 'php-fpm' trying to restart [CET Jan 1 03:10:31] info : 'php-fpm' start: /usr/sbin/service [CET Jan 1 03:10:31] error : 'nginx' process is not running [CET Jan 1 03:10:31] info : 'nginx' trying to restart [CET Jan 1 03:10:31] info : 'nginx' start: /usr/sbin/service [CET Jan 1 03:11:32] error : 'webserver' service restarted 2 times within 2 cycles(s) - exec [CET Jan 1 03:11:32] info : 'webserver' exec: /sbin/reboot [CET Jan 1 03:12:24] info : Starting monit daemon with http interface at [0.0.0.0:2812] [CET Jan 1 03:12:24] info : Monit start delay set -- pause for 240s [CET Jan 1 03:16:24] info : Starting monit HTTP server at [0.0.0.0:2812] [CET Jan 1 03:16:24] info : monit HTTP server started [CET Jan 1 03:16:24] info : 'Memory' Monit started [CET Jan 1 03:16:24] error : skipping queued event /var/monit/id - unknown data format [CET Jan 1 03:16:24] error : skipping queued event /var/monit/state - unknown data format [CET Jan 1 03:16:24] error : 'webserver' service restarted 2 times within 2 cycles(s) - exec [CET Jan 1 03:16:24] info : 'webserver' exec: /sbin/reboot [CET Jan 1 03:17:04] info : Starting monit daemon with http interface at [0.0.0.0:2812] [CET Jan 1 03:17:04] info : Monit start delay set -- pause for 240s [CET Jan 1 03:21:04] info : Starting monit HTTP server at [0.0.0.0:2812] [CET Jan 1 03:21:04] info : monit HTTP server started [CET Jan 1 03:21:04] info : 'Memory' Monit started [CET Jan 1 03:21:04] error : skipping queued event /var/monit/id - unknown data format [CET Jan 1 03:21:04] error : skipping queued event /var/monit/state - unknown data format [CET Jan 1 03:21:04] error : 'webserver' service restarted 2 times within 2 cycles(s) - exec [CET Jan 1 03:21:04] info : 'webserver' exec: /sbin/reboot [CET Jan 1 03:21:44] info : Starting monit daemon with http interface at [0.0.0.0:2812] [CET Jan 1 03:21:44] info : Monit start delay set -- pause for 240s [CET Jan 1 03:25:44] info : Starting monit HTTP server at [0.0.0.0:2812] [CET Jan 1 03:25:44] info : monit HTTP server started [CET Jan 1 03:25:44] info : 'Memory' Monit started [CET Jan 1 03:25:44] error : skipping queued event /var/monit/id - unknown data format [CET Jan 1 03:25:44] error : skipping queued event /var/monit/state - unknown data format [CET Jan 1 03:25:44] error : 'webserver' service restarted 2 times within 2 cycles(s) - exec [CET Jan 1 03:25:44] info : 'webserver' exec: /sbin/reboot 

除非任何人有一个好主意,否则我将切换回多个请求。 最后,他们不那么费时…

BurninLeo

我不想在这里监视nginx进程,而是监视ports / URL。 我可以使用任何其他支票而不是支票过程吗?

你可以使用主机检查,这是来自monit站点的一个例子:

 check host mmonit.com with address mmonit.com if failed port 80 protocol http with http headers [Host: mmonit.com, Cache-Control: no-cache, Cookie: csrftoken=nj1bI3CnMCaiNv4beqo8ZaCfAQQvpgLH] and request /monit/ with content = "Monit [0-9.]+" then alert 

1次失败后执行不同的操作,2次失败,4次失败,如果失败的条件,我需要三个,导致三个服务器请求。 有没有办法在每个循环运行一个请求,并在不同数量的失败之后执行不同的活动?

EXEC可以用来执行任意程序并发送警报。 如果您select此操作,则必须声明要执行的程序,并且如果程序需要参数,则必须将程序及其参数括在引用的string中。 您可以select指定启动时执行的程序应该切换的uid和gid。 例如:

 exec "/usr/local/tomcat/bin/startup.sh" as uid nobody and gid nobody