原谅我,如果这是已经回答的地方 – 我发现很多类似的问题,但似乎没有解决我的问题。
我只是testing一些Windows服务器在nagios中的正常运行时间,如果超过了某个值,我希望提醒一下。
它昨天在工作,而且在某个时候我似乎已经破坏了一些东西,但是不能确切地确定是什么错误。
首先,作为根,testing工作:
./libexec/check_uptime.sh xxxx 28 30 1449919 OK. Uptime 16 Days.
作为nagios,通过su – nagiostesting工程:
su - nagios -bash-3.2$ pwd /usr/local/nagios -bash-3.2$ ./libexec/check_uptime.sh xx.xx.xx.xx 28 30 1449969 OK. Uptime 16 Days.
但是我相信“正确的”testing方法是通过su – nagios – c?
su - nagios -c "./libexec/check_uptime.sh 10.36.128.22 28 30" 1450084 OK. Uptime 16 Days.
但是该命令仍然在网页/守护进程中失败
Uptime UNKNOWN 15-03-2016 11:04:24 0d 1h 4m 10s 3/3 0
命令定义对我来说是正确的:
define command{ command_name check_uptime command_line $USER1$/check_uptime.sh -H $HOSTADDRESS$ 25 28 }
正如服务定义:
define service{ use generic-service hostgroup_name Windows-Servers service_description Uptime check_command check_uptime }
不知何故,在编辑中丢失了脚本,这里又是:
#!/bin/bash ## Shamelessly adapted from http://correctlife.blogspot.de/2011/02/wrapper-on-checkntuptime.html HOSTADDRESS=$1 MAXWARN=28 # in days MAXCRIT=30 # in days MINCRIT=1 STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 SECONDS=`/usr/local/nagios/libexec/check_nt -H $HOSTADDRESS -p 12489 -s $ekr3t -v COUNTER -l "\\System\\System Up Time"` #echo $SECONDS if [ $SECONDS == 0 ]; then echo "UNKNOWN: No uptime recieved. Uptime Value: $SECONDS" exit 3 fi HOURS=$(( $SECONDS / 60 / 60 )) SECONDSINHOURS=$(( $HOURS * 60 * 60 )) DAYS=$(( $HOURS / 24 )) REMAININGSECONDS=$(( $SECONDS - $SECONDSINHOURS )) MINUTES=$(( $REMAININGSECONDS / 60 )) FORMEDUPTIME="${DAYS} Days" if [[ $HOURS -lt $MINCRIT ]]; then echo "CRITICAL: System restarted in last hour." exit 2 fi if [[ $DAYS -ge $MAXCRIT ]]; then echo "CRITICAL: System up over ${MAXCRIT} Days." exit 2 fi if [[ $DAYS -ge $MAXWARN ]]; then echo "WARNING: System up over ${MAXWARN} Days." exit 1 fi echo "OK. Uptime $FORMEDUPTIME." exit 0
我是个白痴。
线索出现在命令定义中。
在某些时候,我会帮忙把“-H”添加到它,这显然意味着我传递了-H作为主机名;)
本来应该:
define command{ command_name check_uptime command_line $USER1$/check_uptime.sh $HOSTADDRESS$ 25 28 }