运行数小时后,NTP停止工作

我的NTP服务器工作了几个小时,然后停止工作,并显示所有主机的“达到0”,如下所示:

remote refid st t when poll reach delay offset jitter ============================================================================== 64-250-105-227. .PPS. 1 u 9h 1024 0 66.644 5.476 0.000 

如果我重新启动ntpd ,他们再工作约8个小时,但最终回来这样。 tcpdump显示他们仍然在发送和接收数据包(路由有点奇怪,因为我们的ISP阻止NTPstream量,但是我们有另外一个出路,有一点基于策略的路由和一个运行OpenVPN的客户端):

 12:05:43.513183 IP (tos 0xc0, ttl 64, id 57760, offset 0, flags [DF], proto UDP (17), length 76) pvelocalhost.ntp > 64-250-105-227.ethoplex.com.ntp: [bad udp cksum 0x40e6 -> 0x6cec!] NTPv4, length 48 Client, Leap indicator: (0), Stratum 2 (secondary reference), poll 10 (1024s), precision -23 Root Delay: 0.066635, Root dispersion: 0.601440, Reference-ID: 64-250-105-227.ethoplex.com Reference Timestamp: 3696656842.987997412 (2017/02/21 03:07:22) Originator Timestamp: 3696656843.552259385 (2017/02/21 03:07:23) Receive Timestamp: 3696656843.580105364 (2017/02/21 03:07:23) Transmit Timestamp: 3696689143.513155341 (2017/02/21 12:05:43) Originator - Receive Timestamp: +0.027845976 Originator - Transmit Timestamp: +32299.960896015 12:05:43.513708 IP (tos 0xc0, ttl 63, id 57760, offset 0, flags [DF], proto UDP (17), length 76) gateway.example.com.ntp > 64-250-105-227.ethoplex.com.ntp: [udp sum ok] NTPv4, length 48 Client, Leap indicator: (0), Stratum 2 (secondary reference), poll 10 (1024s), precision -23 Root Delay: 0.066635, Root dispersion: 0.601440, Reference-ID: 64-250-105-227.ethoplex.com Reference Timestamp: 3696656842.987997412 (2017/02/21 03:07:22) Originator Timestamp: 3696656843.552259385 (2017/02/21 03:07:23) Receive Timestamp: 3696656843.580105364 (2017/02/21 03:07:23) Transmit Timestamp: 3696689143.513155341 (2017/02/21 12:05:43) Originator - Receive Timestamp: +0.027845976 Originator - Transmit Timestamp: +32299.960896015 12:05:43.573035 IP (tos 0x8, ttl 52, id 38657, offset 0, flags [DF], proto UDP (17), length 76) 64-250-105-227.ethoplex.com.ntp > gateway.example.com.ntp: [udp sum ok] NTPv4, length 48 Server, Leap indicator: (0), Stratum 1 (primary reference), poll 10 (1024s), precision -18 Root Delay: 0.000000, Root dispersion: 0.001205, Reference-ID: PPS^@ Reference Timestamp: 3696689128.863678634 (2017/02/21 12:05:28) Originator Timestamp: 3696689143.513155341 (2017/02/21 12:05:43) Receive Timestamp: 3696689143.547838270 (2017/02/21 12:05:43) Transmit Timestamp: 3696689143.548149943 (2017/02/21 12:05:43) Originator - Receive Timestamp: +0.034682918 Originator - Transmit Timestamp: +0.034994553 12:05:43.573264 IP (tos 0x8, ttl 51, id 38657, offset 0, flags [DF], proto UDP (17), length 76) 64-250-105-227.ethoplex.com.ntp > pvelocalhost.ntp: [udp sum ok] NTPv4, length 48 Server, Leap indicator: (0), Stratum 1 (primary reference), poll 10 (1024s), precision -18 Root Delay: 0.000000, Root dispersion: 0.001205, Reference-ID: PPS^@ Reference Timestamp: 3696689128.863678634 (2017/02/21 12:05:28) Originator Timestamp: 3696689143.513155341 (2017/02/21 12:05:43) Receive Timestamp: 3696689143.547838270 (2017/02/21 12:05:43) Transmit Timestamp: 3696689143.548149943 (2017/02/21 12:05:43) Originator - Receive Timestamp: +0.034682918 Originator - Transmit Timestamp: +0.034994553 

长话短说在这里,你可以看到数据包走向64-240-105-227.ethoplex.com.ntp ,你可以看到我们得到的回应是一样的。 第一个UDP校验和是不好的,可能是因为TOE,但是在gateway伪装成源IP并且重新计算这些包的校验和之后,它们似乎都会自行工作。

到底是怎么回事? 除了每隔几个小时设置一个cron作业来重新启动NTP之外,还有哪些选项?