Nagios从3.5.1升级到4.0.8
我想在nagios支持论坛上提出这个问题,但一个小时后,我没有收到确认电子邮件来设置我的帐户…
Nagios似乎作为一个服务运行良好,但networkingCGIs不工作,并没有error.log apache的错误,也没有nagios.log。 我已经检查了权限,并查看了一些有这个embedded式错误的C代码:
哎呦! 错误:无法读取主机和服务状态信息!
以上同样的错误出现在nagios主页左侧的几乎每个菜单上。
nagios.log在启动时看起来像这样,然后停止,从init:
[1431102009] Nagios 4.0.8 starting... (PID=27779) [1431102009] Local time is Fri May 08 13:20:09 ADT 2015 [1431102009] LOG VERSION: 2.0 [1431102009] qh: Socket '/usr/local/nagios/var/rw/query.sh' successfully initialized [1431102009] qh: core query handler registered [1431102009] nerd: Channel hostchecks registered successfully [1431102009] nerd: Channel servicechecks registered successfully [1431102009] nerd: Channel opathchecks registered successfully [1431102009] nerd: Fully initialized and ready to rock! [1431102009] wproc: Successfully registered manager as @wproc with query handler [1431102009] wproc: Registry request: name=Core Worker 27785;pid=27785 [1431102009] wproc: Registry request: name=Core Worker 27786;pid=27786 [1431102009] wproc: Registry request: name=Core Worker 27782;pid=27782 [1431102009] wproc: Registry request: name=Core Worker 27781;pid=27781 [1431102009] wproc: Registry request: name=Core Worker 27783;pid=27783 [1431102009] wproc: Registry request: name=Core Worker 27784;pid=27784 [1431102009] Successfully launched command file worker with pid 27787 [1431102022] Caught SIGTERM, shutting down... [1431102022] Successfully shutdown... (PID=27779) [1431102022] Event broker module 'NERD' deinitialized successfully.
运行与-v是干净的:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg Nagios Core 4.0.8 Copyright (c) 2009-present Nagios Core Development Team and Community Contributors Copyright (c) 1999-2009 Ethan Galstad Last Modified: 08-12-2014 License: GPL Website: http://www.nagios.org Reading configuration data... Read main config file okay... Read object config files okay... Running pre-flight check on configuration data... Checking objects... Checked 816 services. Checked 826 hosts. Checked 11 host groups. Checked 0 service groups. Checked 18 contacts. Checked 13 contact groups. Checked 61 commands. Checked 6 time periods. Checked 0 host escalations. Checked 0 service escalations. Checking for circular paths... Checked 826 hosts Checked 0 service dependencies Checked 0 host dependencies Checked 6 timeperiods Checking global event handlers... Checking obsessive compulsive processor commands... Checking misc settings... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
另外,check_nagios表示我们正在运行OK:
# /usr/local/nagios/libexec/check_nagios /var/log/nagios.log 5 '/usr/local/nagios/bin/nagios' NAGIOS OK: 8 processes, status log updated 11 seconds ago
一种可能性是错误意味着它不能访问nagios.cfg文件。 我已经检查过,path上的所有目录都是“其他”(覆盖apache用户)的rx。 无论如何,如果有一个权限问题,这应该是一个Apache的错误。 我一直在这个工作了几个小时,找不到失败点,或者是什么改变了。
主页面还显示Nagios Core徽标下的“无法获得进程状态”。 这是从main.php运行statusjson.cgi – 不知道它在看什么,但是当我从main.php手动运行CGI查询(cgi-bin / statusjson.cgi?query = programstatus)时,页面是空白的。 我GOOGLE了这个searchnagios论坛,但其他人似乎有一些日志错误(S)给更多的线索。
我有一个exception…
我发现另一个nagios.log,每次服务启动时都会触及几行:
# cat /usr/local/nagios/var/nagios.log [1431088940] Error: Cannot open main configuration file '/' for reading! [1431088940] Error: Failed to process config file '/'. Aborting
也许有些古怪的init或cfg文件,但我找不到它。 作为另一个testing,我可以sui来手动运行守护进程。
su - nagios [nagios@atlas ~]$ /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg Nagios Core 4.0.8 Copyright (c) 2009-present Nagios Core Development Team and Community Contributors Copyright (c) 1999-2009 Ethan Galstad Last Modified: 08-12-2014 License: GPL Website: http://www.nagios.org Nagios 4.0.8 starting... (PID=23234) Local time is Fri May 08 13:45:12 ADT 2015 nerd: Channel hostchecks registered successfully nerd: Channel servicechecks registered successfully nerd: Channel opathchecks registered successfully nerd: Fully initialized and ready to rock! wproc: Successfully registered manager as @wproc with query handler wproc: Registry request: name=Core Worker 23235;pid=23235 wproc: Registry request: name=Core Worker 23236;pid=23236 wproc: Registry request: name=Core Worker 23237;pid=23237 wproc: Registry request: name=Core Worker 23238;pid=23238 wproc: Registry request: name=Core Worker 23239;pid=23239 wproc: Registry request: name=Core Worker 23240;pid=23240 Successfully launched command file worker with pid 23241
我希望这可以避免init脚本中的任何exception。 它不会触及/usr/local/nagios/var/nagios.log(预期),但它不会更改网站cgis中的错误。 另一个线索是,当像这样手动启动nagios时,在主机和状态项的屏幕上看不到任何日志logging。 如果我启动init,有些主机性能会有一些警告,从nagios日志中发出震耳欲聋的声音,但从nagios用户的命令行启动时,并不是说上面提到的那样。
这个问题最终确实进入了nagios核心支持论坛,并在那里得到了解决。
http://support.nagios.com/forum/viewtopic.php?f=7&t=32795
在这个特定的情况下,我们错过了configuration项
state_retention status_file
但有许多不同types的错误,这也可能导致以“哎呀!”开头的网页界面错误。