我正在使用CHECK PROGRAM指令的Monit 5.5来执行一个外部脚本,它会执行一些工作来validation我的应用程序是否正常工作。 我想每隔几个周期检查一遍,以避免成为应用程序的负担。 我的configuration是这样的:
CHECK program mydaemon with path "/usr/local/sbin/my_check.sh" ALERT [email protected] ON { exec } START PROGRAM "/etc/init.d/mydaemon start" STOP PROGRAM "/etc/init.d/mydaemon stop" if status = 1 for 2 cycles then restart # Trick monit into doing a restart + hitting our local alert if status = 1 for 4 cycles then exec "/bin/true" if status = 1 for 6 cycles then unmonitor every 3 cycles
事情的工作几乎和预期的一样 – 每隔3个周期,monit执行检查或采取行动,尽pipe正如你可能通过configuration中的评论猜测的,我在日志中看到的是在周期3,4和5监测执行重新启动操作:
May 24 14:03:24 monit[19488]: 'mydaemon' status failed (1) for /usr/local/sbin/my_check.sh -- Error: testing! May 24 14:03:54 monit[19488]: 'mydaemon' status failed (1) for /usr/local/sbin/my_check.sh -- Error: testing! May 24 14:03:54 monit[19488]: 'mydaemon' trying to restart May 24 14:03:54 monit[19488]: 'mydaemon' stop: /etc/init.d/mydaemon May 24 14:03:54 monit[19488]: 'mydaemon' start: /etc/init.d/mydaemon May 24 14:04:24 monit[19488]: 'mydaemon' status failed (1) for /usr/local/sbin/my_check.sh -- Error: testing! May 24 14:04:24 monit[19488]: 'mydaemon' trying to restart May 24 14:04:24 monit[19488]: 'mydaemon' stop: /etc/init.d/mydaemon May 24 14:04:24 monit[19488]: 'mydaemon' start: /etc/init.d/mydaemon May 24 14:04:54 monit[19488]: 'mydaemon' status failed (1) for /usr/local/sbin/my_check.sh -- Error: testing! May 24 14:04:54 monit[19488]: 'mydaemon' status failed (1) for /usr/local/sbin/my_check.sh -- Error: testing! May 24 14:04:54 monit[19488]: 'mydaemon' exec: /bin/true May 24 14:04:54 monit[19488]: 'mydaemon' status failed (1) for /usr/local/sbin/my_check.sh -- Error: testing! May 24 14:04:54 monit[19488]: 'mydaemon' trying to restart May 24 14:04:54 monit[19488]: 'mydaemon' stop: /etc/init.d/mydaemon May 24 14:04:54 monit[19488]: 'mydaemon' start: /etc/init.d/mydaemon May 24 14:05:25 monit[19488]: 'mydaemon' status failed (1) for /usr/local/sbin/my_check.sh -- Error: testing! May 24 14:05:25 monit[19488]: 'mydaemon' status failed (1) for /usr/local/sbin/my_check.sh -- Error: testing! May 24 14:05:25 monit[19488]: 'mydaemon' exec: /bin/true May 24 14:05:25 monit[19488]: 'mydaemon' status failed (1) for /usr/local/sbin/my_check.sh -- Error: testing! May 24 14:05:25 monit[19488]: 'mydaemon' trying to restart May 24 14:05:25 monit[19488]: 'mydaemon' stop: /etc/init.d/mydaemon May 24 14:05:25 monit[19488]: 'mydaemon' start: /etc/init.d/mydaemon
为什么monit在未指定的周期内执行restart操作?
PS – 我的monit周期长度是10s,因此日志snippit中的动作相隔30s。
让我们打破逻辑:
周期1
检查结果:连续1次失败
行动:没有
第二周期
检查结果:连续2次失败
行动:重新启动(满足第一个条件)
循环3
检查结果:连续3次失败
行动:重新启动(第一个条件仍然满足,最后两个周期退出状态= 1)
周期4
检查结果:连续4次失败
操作:重新启动AND exec / bin / true(满足第一个和第二个条件)
循环5
检查结果:连续5次失败
行动:重新启动AND exec / bin / true(第一和第二个条件都仍然满足)
由于你的程序总是返回1,所以在第二个循环的检查结果之后总是会满足第一个条件,因为(至less)最后2个循环总是失败,直到你没有监视。