NTPD意外死亡的可能原因和解决办法

在使用s3进行物理文件存储的Web应用程序中，我们遇到了NTP不断死亡的问题。这似乎每天大概发生一次或两次。发生这种情况时提供的信息非常less，除了PID文件存在但检查状态时服务已经停止。

任何人都可以提出NTPD死亡的可能原因吗？我假设时钟漂移可能导致它死亡，但我不知道会是什么原因造成的。有足够的内存和可用的磁盘空间。

服务最后一次死亡，这是输出：

Sep 6 06:15:25 vm02 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="988" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Sep 6 06:17:06 vm02 ntpd[10803]: 0.0.0.0 0618 08 no_sys_peer Sep 6 08:01:10 vm02 ntpd[10803]: 0.0.0.0 0617 07 panic_stop -28101 s; set clock manually within 1000 s.

我会说没有1分钟的方法来find确切的原因。

我们在ESXi环境中遇到了类似的问题。为了简化这个故事，我们发现ESXi主机的时钟漂移很大，访客虚拟机正在同步来自ESXi主机和上游NTP服务器的时间。这导致虚拟机上的NTPd困惑，因此经常死亡。

我们还发现，在一些罕见的情况下，随机丢包也会导致NTPd退出，因为您的服务器和上游NTPd服务器之间的往返时间用于计算漂移时间。

在上述两种情况下，如果NTPd看到大量的时间漂移，例如超过1000秒，则默认退出。 -g选项会有所帮助。

  -g Normally, ntpd exits with a message to the system log if the offset exceeds the panic threshold, which is 1000 s by default. This option allows the time to be set to any value without restriction; however, this can happen only once. If the threshold is exceeded after that, ntpd will exit with a message to the system log. This option can be used with the -q and -x options. See the tinker command for other options.

你可以看看系统日志 ，应该有一些单词可能会给你一个提示。你也可以监视“ntpq -p”输出 ，粗略地了解偏移量是如何发展的。

日志消息清楚地表明时钟漂移是退出的原因。可能的解决scheme：

用-g标志启动ntpd; 然而，这不能解决时钟偏斜的根本原因。
在启动ntpd之前运行ntpdate; 大概相同的警告。

添加更多时间来源; NTP需要4-6个来源才能保持良好的准确性。一个简单的方法是在您的configuration中包含对[0-3] .YOURREGION.pool.ntp.org的重复引用，例如

 server 0.au.pool.ntp.org iburst server 1.au.pool.ntp.org iburst server 2.au.pool.ntp.org iburst server 3.au.pool.ntp.org iburst server 0.au.pool.ntp.org iburst server 1.au.pool.ntp.org iburst server 2.au.pool.ntp.org iburst server 3.au.pool.ntp.org iburst

另一个select，你可以尝试是小天使。在我们的testing中，它比ntpd执行更稳定，更好地处理虚拟环境中经历的时间偏差。

http://chrony.tuxfamily.org/