Wheezy进程周期性地死亡

我有一个很奇怪的问题。 我的邮件服务器上的某些进程(但不是全部)会定期(每隔一个月左右)死掉。 死亡的一些过程是:

  • SSH
  • 达夫科特
  • 后缀

不死的进程是:

  • 的Apache2

我的系统正在运行(Debian Wheezy):

$ uname -a Linux hostname 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2+deb7u2 x86_64 GNU/Linux 

我已经经历了/var/log文件,但是事情似乎都是在事件发生之后才发生的,事件总是发生在早晨6:25。

首先,我认为它与ntpdate每日cron做的事情,所以我删除它,并用ntpd取代它,而不需要cron。 有帮助吗? 没有。

然后我认为它有syslogd做的事情。 看来,死亡的进程都在使用syslog进行日志logging。 我search了一下,但是我没有发现其他人遇到与我一样的问题。 当您的日志logging机制不起作用时,真的很难find问题所在!

这里是所有在事件发生时间(6:25)被修改的日志文件。 在那之后没有日志,所有的日志活动都停止了! 请看看是否有可能导致进程死亡或logging停止的事情。

在/ var / log / syslog的

 Feb 16 06:25:01 hostname /USR/SBIN/CRON[32606]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done) Feb 16 06:25:01 hostname /USR/SBIN/CRON[32607]: (getmail) CMD (/usr/local/bin/run-getmail.sh > /dev/null 2>> /dev/null) Feb 16 06:25:01 hostname /USR/SBIN/CRON[32608]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )) Feb 16 06:25:02 hostname dovecot: imap-login: Disconnected (disconnected before greeting, waited 0 secs): user=<>, rip=127.0.0.1, lip=127.0.0.1, secured, session=<v9PKQn/y+gB/AAAB> Feb 16 06:25:02 hostname postfix/smtpd[32647]: connect from localhost[127.0.0.1] Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [milter][end][connect][stop][0.000481](37362): milter-greylist Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [session][end][connect][accept][0.09962](37361) Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [sessions][finished] 18681(+1) 0 Feb 16 06:25:02 hostname postfix/smtpd[32647]: lost connection after CONNECT from localhost[127.0.0.1] Feb 16 06:25:02 hostname postfix/smtpd[32647]: disconnect from localhost[127.0.0.1] 

/var/log/php5-fpm.log

 [09-Feb-2014 06:25:07] NOTICE: error log file re-opened [16-Feb-2014 06:25:06] NOTICE: Terminating ... [16-Feb-2014 06:25:07] NOTICE: exiting, bye-bye! 

/var/log/mail.log

 Feb 16 06:25:02 hostname dovecot: imap-login: Disconnected (disconnected before greeting, waited 0 secs): user=<>, rip=127.0.0.1, lip=127.0.0.1, secured, session=<v9PKQn/y+gB/AAAB> Feb 16 06:25:02 hostname postfix/smtpd[32647]: connect from localhost[127.0.0.1] Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [milter][end][connect][stop][0.000481](37362): milter-greylist Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [session][end][connect][accept][0.09962](37361) Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [sessions][finished] 18681(+1) 0 Feb 16 06:25:02 hostname postfix/smtpd[32647]: lost connection after CONNECT from localhost[127.0.0.1] Feb 16 06:25:02 hostname postfix/smtpd[32647]: disconnect from localhost[127.0.0.1] 

/var/log/mail.info

 Feb 16 06:25:02 hostname dovecot: imap-login: Disconnected (disconnected before greeting, waited 0 secs): user=<>, rip=127.0.0.1, lip=127.0.0.1, secured, session=<v9PKQn/y+gB/AAAB> Feb 16 06:25:02 hostname postfix/smtpd[32647]: connect from localhost[127.0.0.1] Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [milter][end][connect][stop][0.000481](37362): milter-greylist Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [session][end][connect][accept][0.09962](37361) Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [sessions][finished] 18681(+1) 0 Feb 16 06:25:02 hostname postfix/smtpd[32647]: lost connection after CONNECT from localhost[127.0.0.1] Feb 16 06:25:02 hostname postfix/smtpd[32647]: disconnect from localhost[127.0.0.1] 

/var/log/fail2ban.log

 2014-02-16 06:25:06,899 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log 2014-02-16 06:25:07,271 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/other_vhosts_access.log 2014-02-16 06:25:07,275 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log 2014-02-16 06:25:07,279 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log 2014-02-16 06:25:07,281 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log 2014-02-16 06:25:07,283 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/other_vhosts_access.log 2014-02-16 06:25:07,269 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/other_vhosts_access.log 2014-02-16 06:25:07,287 fail2ban.server : INFO Stopping all jails 2014-02-16 06:25:07,719 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log 2014-02-16 06:25:08,461 fail2ban.jail : INFO Jail 'php-url-fopen' stopped 2014-02-16 06:25:08,595 fail2ban.actions: WARNING [apache-w00tw00t] Unban 178.32.243.78 2014-02-16 06:25:08,702 fail2ban.actions: WARNING [apache-w00tw00t] Unban 83.212.122.172 2014-02-16 06:25:09,270 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log 2014-02-16 06:25:09,283 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log 2014-02-16 06:25:09,285 fail2ban.jail : INFO Jail 'apache-w00tw00t' stopped 2014-02-16 06:25:09,298 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log 2014-02-16 06:25:10,325 fail2ban.jail : INFO Jail 'apache-noscript' stopped 2014-02-16 06:25:11,361 fail2ban.jail : INFO Jail 'pam-generic' stopped 2014-02-16 06:25:12,330 fail2ban.jail : INFO Jail 'apache-badbots' stopped 2014-02-16 06:25:13,294 fail2ban.jail : INFO Jail 'apache-nohome' stopped 2014-02-16 06:25:14,326 fail2ban.jail : INFO Jail 'ssh-ddos' stopped 2014-02-16 06:25:14,827 fail2ban.jail : INFO Jail 'exim' stopped 2014-02-16 06:25:15,393 fail2ban.jail : INFO Jail 'webmin' stopped 2014-02-16 06:25:16,330 fail2ban.jail : INFO Jail 'apache' stopped 2014-02-16 06:25:17,296 fail2ban.jail : INFO Jail 'ssh' stopped 2014-02-16 06:25:18,285 fail2ban.jail : INFO Jail 'apache-overflows' stopped 2014-02-16 06:25:18,504 fail2ban.jail : INFO Jail 'dovecot' stopped 2014-02-16 06:25:19,333 fail2ban.jail : INFO Jail 'squirrelmail' stopped 2014-02-16 06:25:20,335 fail2ban.jail : INFO Jail 'apache-myadmin' stopped 2014-02-16 06:25:20,336 fail2ban.server : INFO Exiting Fail2ban 

/var/log/auth.log

 Feb 16 06:25:01 hostname CRON[32604]: pam_unix(cron:session): session opened for user root by (uid=0) Feb 16 06:25:01 hostname CRON[32605]: pam_unix(cron:session): session opened for user getmail by (uid=0) Feb 16 06:25:01 hostname CRON[32603]: pam_unix(cron:session): session opened for user root by (uid=0) Feb 16 06:25:01 hostname CRON[32605]: pam_unix(cron:session): session closed for user getmail Feb 16 06:25:02 hostname CRON[32604]: pam_unix(cron:session): session closed for user root 

首先,你的机器每隔几个月6:25 AM做一些奇怪的事情。 我会看看所有的cron作业。

然后,如果没有任何东西似乎是假的,请尝试将您的问题与内核日志相关联。 发出dmesg并查找内存耗尽问题,在这种情况下,内核将终止进程以避免可能导致恐慌的情况。

另外,请仔细看一下/var/log/ispconfig/cron.log

如果您怀疑有任何未经授权的访问,请检查/usr/local/ispconfig/server/server.sh

PS:我也会尝试第一次发现这个问题,然后在那个时间之前仔细寻找修改

编辑:

我注意到你最后的评论,编写一个简单的shell脚本来获取这些作业运行时的内存使用情况是非常有用的。

 #!/bin/sh somefile="/your/file/path" date >>$SomeFile free -m >>$SomeFile 

编辑cronjobs,并在你的内存消耗工作和几个后,运行几秒钟,然后比较结果。 这应该有助于您决定何时升级内存,修改软件configuration等

PS:正如你所看到的,这是一个基本的脚本,但作为一个起点,它是可用的。 你可以进一步改进