我们有一个BIP-IP 6400 LTM设备,以惊人的频率杀死进程。 CPU的利用率一直在23%左右,所以这不成问题。
这是来自/var/log/ltm的示例:
Oct 7 08:21:55 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25338 exited with signal = 9 Oct 7 08:22:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25587 exited with signal = 9 Oct 7 08:22:34 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25793 exited with signal = 9 Oct 7 08:23:10 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26260 exited with signal = 9 Oct 7 08:23:36 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26584 exited with signal = 9 Oct 7 08:23:40 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26647 exited with signal = 9 Oct 7 08:23:45 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26699 exited with signal = 9 Oct 7 08:23:55 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26805 exited with signal = 9 Oct 7 08:25:36 local/pri-4600 info bigd[3471]: reap_child: child process PID = 28079 exited with signal = 9 Oct 7 08:27:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29286 exited with signal = 9 Oct 7 08:27:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29307 exited with signal = 9 Oct 7 08:27:56 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29793 exited with signal = 9 Oct 7 08:29:20 local/pri-4600 info bigd[3471]: reap_child: child process PID = 30851 exited with signal = 9 Oct 7 08:33:00 local/pri-4600 info bigd[3471]: reap_child: child process PID = 1122 exited with signal = 9 Oct 7 08:33:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 1299 exited with signal = 9 Oct 7 08:34:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2054 exited with signal = 9 Oct 7 08:35:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2784 exited with signal = 9 Oct 7 08:35:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2807 exited with signal = 9 Oct 7 08:35:35 local/pri-4600 info bigd[3471]: reap_child: child process PID = 3015 exited with signal = 9 Oct 7 08:36:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 3601 exited with signal = 9
这是正常的吗? 如果没有,可能会导致这种情况发生?
bigd是BIG-IP上的监视守护进程,因此显示正在使用的监视器正在崩溃。 你应该打开支持的情况下上传你的qkview到ihealth.f5.com。 这是与该错误消息相关的解决scheme:
https://support.f5.com/kb/en-us/solutions/public/17000/000/sol17092.html
这是我们运行的10.2.4 BIG-IP软件中的一个已知的错误。
从F5支持:
…你遇到一个内部跟踪的已知问题:bug ID539130“bigd可能在处理SIGCHLD时导致死锁,导致bigd心跳失败和SIGABRT” – = Condition = – 外部监视器运行很长时间,并且被下一次迭代监视器,可能导致bigd崩溃和核心,这导致健康监测暂时失效。
修复方法是使用Hotfix-BIGIP-10.2.4-HF12-866.11-ENG更新软件。