这些崩溃的imapd进程的原因是什么?

我们有一台Mac OS X 10.5 Leopard Server邮件服务器,最近这个周末,IMAP邮箱开始出现“格式无效”的问题。 事实certificate,在收集IMAP数据的卷上有一些坏块数量,修复卷和受影响的邮箱后问题没有重现。 但是,一个持续存在的新问题经常会imaps进程崩溃,并且不断增加imaps ”错误,如下所示:

 Apr 13 17:01:12 host lmtpunix[31509]: DBERROR db4: 1134 lockers 

/var/log/system.log崩溃的imaps进程的错误如下所示:

 Apr 12 13:43:10 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication Apr 12 13:43:12 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication Apr 12 13:43:13 host imaps[11792]: login: pool-72-92-XXX-XXX.burl.east.myfairpoint.net [72.92.XXX.XXX] user3 CRAM-MD5+TLS User logged in Apr 12 13:43:15 host ReportCrash[14362]: Formulating crash report for process imapd[11792] Apr 12 13:43:15 host master[94896]: process 11792 exited, signaled to death by 11 Apr 12 13:43:15 host ReportCrash[14362]: Saved crashreport to /Library/Logs/CrashReporter/imapd_2011-04-12-134315_host.crash using uid: 0 gid: 0, euid: 0 egid: 0 

并从/var/log/mailaccess.log下面:

 Apr 12 13:43:10 host imaps[11792]: accepted connection Apr 12 13:43:10 host imaps[11792]: mydelete: starting txn 2147495107 Apr 12 13:43:10 host imaps[11792]: mydelete: committing txn 2147495107 Apr 12 13:43:10 host imaps[11792]: mystore: starting txn 2147495108 Apr 12 13:43:10 host imaps[11792]: mystore: committing txn 2147495108 Apr 12 13:43:10 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication Apr 12 13:43:12 host imaps[11792]: accepted connection Apr 12 13:43:12 host imaps[11792]: mydelete: starting txn 2147495112 Apr 12 13:43:12 host imaps[11792]: mydelete: committing txn 2147495112 Apr 12 13:43:12 host imaps[11792]: mystore: starting txn 2147495113 Apr 12 13:43:12 host imaps[11792]: mystore: committing txn 2147495113 Apr 12 13:43:12 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication Apr 12 13:43:12 host imaps[11792]: AOD: user options: no lookup required for: user3 Apr 12 13:43:13 host imaps[11792]: login: pool-72-92-XXX-XXX.burl.east.myfairpoint.net [72.92.149.161] user3 CRAM-MD5+TLS User logged in Apr 12 13:43:13 host imaps[11792]: quota set to "unlimited" for mailbox user.user3 Apr 12 13:43:13 host imaps[11792]: open: user user3 opened Other Users/listmaster Apr 12 13:43:15 host master[94896]: process 11792 exited, signaled to death by 11 Apr 12 13:43:15 host master[94896]: service imaps pid 11792 in BUSY state: terminated abnormally Apr 12 13:43:15 host master[94896]: process 11792 exited, signaled to death by 11 Apr 12 13:43:15 host master[94896]: service imaps pid 11792 in BUSY state: terminated abnormally 

崩溃报告如下所示:

 Process: imapd [39069] Path: /usr/bin/cyrus/bin/imapd Identifier: imapd Version: ??? (???) Code Type: X86 (Native) Parent Process: master [38605] Date/Time: 2011-04-13 18:25:24.068 -0400 OS Version: Mac OS X Server 10.5.7 (9J61) Report Version: 6 Anonymous UUID: 223C4DD1-2AE2-4381-8A28-DEB9082281A8 Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000077a0ca64 Crashed Thread: 0 Thread 0 Crashed: 0 imapd 0x0003090c process_records + 588 1 imapd 0x00031362 mailbox_expunge + 2146 2 imapd 0x00006fde cmd_close + 179 3 imapd 0x00018cf8 cmdloop + 2940 4 imapd 0x0001c1b7 service_main + 1498 5 imapd 0x00002e73 main + 3502 6 imapd 0x00002006 start + 54 Thread 0 crashed with X86 Thread State (32-bit): eax: 0x61766970 ebx: 0x000306cb ecx: 0x00000008 edx: 0x77a0ca64 edi: 0x00bfffa4 esi: 0x162a5fa4 ebp: 0xbfffad48 esp: 0xbfffac90 ss: 0x0000001f efl: 0x00010202 eip: 0x0003090c cs: 0x00000017 ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037 cr2: 0x77a0ca64 

是的,他们都在mailbox_expunge process_records中崩溃。

我并没有在日志中看到任何其他的错误,至less看起来和崩溃的进程有关,或者像SQUAT failed to open index fileIOERROR: fstating sieve script /usr/sieve/u/user/defaultbc: No such file or directory一样无害IOERROR: fstating sieve script /usr/sieve/u/user/defaultbc: No such file or directory

我必须承认,我还没有重buildOther Users/listmaster邮箱和Other Users/listmaster邮箱。 这并不总是相同的用户。

我们确实有一些用户发现发送的邮件没有被保存到他们的“发送邮件”邮箱,并且自发布之日起没有被发送。 重build他们的邮箱(目前使用sudo mailbfr -m username因为除了sudo /usr/bin/cyrus/bin/reconstruct -r user/username我通常运行之外,还修复了一些额外的权限)似乎允许新发送电子邮件被保存到它,但我无法find这个问题之间的相关性(或日志中的任何其他错误)。

任何build议将不胜感激。 试图删除邮件真的是崩溃吗? 我应该单独重build所有用户的邮箱吗? 我真的不想重build整个Cyrus数据库,并且丢失所有消息的所有标记/读取状态。

我相信,损坏的块进入不正确的数据库索引,导致崩溃,而存储新的数据。 除了重build数据库,你可以做的事情不多。 你可以备份用户,看到文件并尝试使用它们,但是在testing用户上testing这个想法。 说实话,我认为harrdrive与坏块应尽快从服务器上删除无论如何

我早就解决了这个问题。

我不记得确切的命令,但是我发现了一种合理的方式将特定的崩溃关联到特定的用户,然后我可以运行mailbfr -m来重build该用户的邮箱。 最终,我能够重build所有的问题邮箱,摆脱问题的服务器。