Rsyslog 8在退出时挂起

在Ubuntu 16.04.1上运行:

rsyslogd 8.16.0, compiled with: PLATFORM: x86_64-pc-linux-gnu PLATFORM (lsb_release -d): FEATURE_REGEXP: Yes GSSAPI Kerberos 5 support: Yes FEATURE_DEBUG (debug build, slow code): No 32bit Atomic operations supported: Yes 64bit Atomic operations supported: Yes memory allocator: system default Runtime Instrumentation (slow code): No uuid support: Yes Number of Bits in RainerScript integers: 64 

有:

 #rsyslogd -N1 rsyslogd: version 8.16.0, config validation run (level 1), master config /etc/rsyslog.conf rsyslogd: command 'KLogPermitNonKernelFacility' is currently not permitted - did you already set it via a RainerScript command (v6+ config)? [v8.16.0 try http://www.rsyslog.com/e/2222 ] 

挂起服务停止,如果下面的configuration是有效的:

 $ModLoad imfile ########## # MySQL slow query log (standard log is autosent to syslog): $InputFileName /var/log/mysql/mysql-slow.log $InputFileTag mysqld_slow: $InputFileStateFile state-mysqld_slow $InputFileSeverity info $InputRunFileMonitor ########## # Apache access: ### $InputFileName /var/log/virtualmin/mysite.com_access_log $InputFileTag www-access-mysite.com: $InputFileStateFile state-www-access-mysite.com $InputFileSeverity info $InputRunFileMonitor ########## # Apache Error: ### $InputFileName /var/log/virtualmin/mysite.com_error_log $InputFileTag www-errors-mysite.org: $InputFileStateFile state-www-errors-mysite.com $InputFileSeverity error $InputRunFileMonitor ### Main Error Log $InputFileName /var/log/apache2/error.log $InputFileTag www-errors-main: $InputFileStateFile state-www-errors $InputFileSeverity error $InputRunFileMonitor # Setup disk assisted queues $ActionQueueFileName fwdRule1 # unique name prefix for spool files $ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible) $ActionQueueSaveOnShutdown on # save messages to disk on shutdown $ActionQueueType LinkedList # run asynchronously $ActionResumeRetryCount -1 # infinite retries if host is down #RsyslogGnuTLS $DefaultNetstreamDriverCAFile /etc/rsyslog.d/keys/ca.d/certs.crt ### ###First BLOCK ### template(name="access" type="string" string="<%pri%>%protocol-version% %timestamp:::date-rfc3339% Host1 %app-name% %procid% %msgid% [id@40000 tag=\"access\"] %msg%\n") template(name="errors" type="string" string="<%pri%>%protocol-version% %timestamp:::date-rfc3339% Host1 %app-name% %procid% %msgid% [id@40000 tag=\"errors\"] %msg%\n") # Send messages over TCP using the template. :syslogtag, startswith, "www-access" action(type="omfwd" protocol="tcp" target="logs.com" port="6000" template="access" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.logs.com") :syslogtag, startswith, "www-error" action(type="omfwd" protocol="tcp" target="logs.com" port="6000" template="errors" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.logs.com") ### ###Second BLOCK ### template(name="standard" type="string" string="<%pri%>%protocol-version% %timestamp:::date-rfc3339% Host1 %app-name% %procid% %msgid% %msg%\n") # Send messages over TCP using the template. :syslogtag, startswith, "www-access" action(type="omfwd" protocol="tcp" target="logs.other.com" port="38549" template="standard" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.other.com") :syslogtag, startswith, "www-errors" action(type="omfwd" protocol="tcp" target="logs.other.com" port="38549" template="standard" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.other.com") :syslogtag, startswith, "mysqld" action(type="omfwd" protocol="tcp" target="logs.other.com" port="38549" template="standard" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.other.com") 

如果通过rsyslogd -dn运行,你可以看到:

 9744.300605710:main thread : main Q: EnqueueMsg advised worker start 9744.300724026:main thread : Terminating main queue... 9744.300734161:main thread : main Q: initiating worker thread shutdown sequence 9744.300739666:main thread : main Q: trying shutdown of regular workers 9744.300766918:main thread : sent SIGTTIN to worker thread 0x9b57a700 9744.300776037:main Q:Reg/w0 : wti 0x16756a0: worker awoke from idle processing 9744.300782296:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 0 objects and enqueued 0 objects 9744.300786960:main Q:Reg/w0 : doDeleteBatch: delete batch from store, new sizes: log 0, phys 0 9744.300791987:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 sz phys 0 9744.300797344:main Q:Reg/w0 : wti 0x16756a0: terminating worker terminateRet=5, bInactivityTOOccured=0 9744.300803075:main Q:Reg/w0 : DDDD: wti 0x16756a0: worker cleanup action instances 9744.300807699:main Q:Reg/w0 : wti 0x16756a0, action 0, ptr 0x7f1394004210 9744.300825960:main Q:Reg/w0 : wti 0x16756a0, action 1, ptr 0x7f1394038f80 9744.300835440:main Q:Reg/w0 : wti 0x16756a0, action 2, ptr 0x7f13940022a0 9744.300875784:main Q:Reg/w0 : wti 0x16756a0, action 3, ptr 0x7f1394004470 9744.300885088:main Q:Reg/w0 : wti 0x16756a0, action 4, ptr (nil) 9744.300890048:main Q:Reg/w0 : wti 0x16756a0, action 5, ptr (nil) 9744.300894505:main Q:Reg/w0 : wti 0x16756a0, action 6, ptr (nil) 9744.300898883:main Q:Reg/w0 : wti 0x16756a0, action 7, ptr (nil) 9744.300903665:main Q:Reg/w0 : wti 0x16756a0, action 8, ptr (nil) 9744.300908515:main Q:Reg/w0 : wti 0x16756a0, action 9, ptr (nil) 9744.300912910:main Q:Reg/w0 : wti 0x16756a0, action 10, ptr 0x7f13940023f0 9744.300918720:main Q:Reg/w0 : wti 0x16756a0, action 11, ptr 0x7f13940068f0 9744.300941837:main Q:Reg/w0 : wti 0x16756a0, action 12, ptr 0x7f13940504b0 9744.300951981:main Q:Reg/w0 : wti 0x16756a0, action 13, ptr 0x7f1394462e10 9744.301055305:main thread : main Q:Reg: waiting 1499ms on worker thread termination, 1 still running 9745.800919600:main thread : main Q:Reg: timeout waiting on worker thread termination 9745.800976996:main thread : sent SIGTTIN to worker thread 0x9b57a700 9745.801024045:main thread : main Q: regular shutdown timed out on primary queue (this is OK) 9745.801030518:main thread : main Q: checking to see if we need to cancel any worker threads of the primary queue 9745.801035688:main thread : sent SIGTTIN to worker thread 0x9b57a700, giving it a chance to terminate 9745.811204331:main thread : cooperative worker termination failed, using cancellation... 9745.811246502:main thread : wti 0x16756a0: canceling worker thread 

我已经试过用防火墙启动/停止,根据在这个线程在支持论坛 (我没有得到任何帮助)的build议,

strace的尾巴显示

 6700 14:13:02 write(6, "Oct 6 14:13:02 hostname rsyslogd: [origin software=\"rsyslogd\" swVersion=\"8.16.0\" x-pid=\"6696\" x-info=\"http://www.rsyslog.com\"] exiting on signal 15.\n", 146) = 146 6700 14:13:02 futex(0x1a3516c, FUTEX_WAIT_PRIVATE, 99, NULL <unfinished ...> 6696 14:13:02 <... futex resumed> ) = 1 6696 14:13:02 futex(0x1a3516c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1a35168, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...> 6700 14:13:02 <... futex resumed> ) = 0 6700 14:13:02 futex(0x1a34f80, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 6696 14:13:02 <... futex resumed> ) = 1 6696 14:13:02 tgkill(6696, 6700, SIGTTIN <unfinished ...> 6700 14:13:02 <... futex resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) 6700 14:13:02 --- SIGTTIN {si_signo=SIGTTIN, si_code=SI_TKILL, si_pid=6696, si_uid=104} --- 6700 14:13:02 rt_sigreturn({mask=~[KILL STOP TTIN RTMIN RT_1]}) = -1 EINTR (Interrupted system call) 6700 14:13:02 futex(0x1a34f80, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 6696 14:13:02 <... tgkill resumed> ) = 0 6696 14:13:02 futex(0x1a34f80, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 6700 14:13:02 <... futex resumed> ) = 0 6700 14:13:02 futex(0x1a34f80, FUTEX_WAKE_PRIVATE, 1) = 0 6700 14:13:02 writev(14, [{"\25\3\3\0\32\0\0\0\0\0\0\1qv\373\273}\275\276\252\207\\~\7\343p\237[\343'\363", 31}], 1) = 31 6700 14:13:02 recvfrom(14, <unfinished ...> 6696 14:13:02 <... futex resumed> ) = 1 6696 14:13:02 futex(0x1a35014, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1475763183, 729652779}, ffffffff <unfinished ...> 6700 14:13:02 <... recvfrom resumed> "\25\3\3\0\32", 5, 0, NULL, NULL) = 5 6700 14:13:02 mprotect(0x7fd3145d8000, 16384, PROT_READ|PROT_WRITE) = 0 6700 14:13:02 recvfrom(14, "w_\345'\310\3473\3\\q\32 \37\276\237A\270a\260\200\24\34p\262G\277", 26, 0, NULL, NULL) = 26 6700 14:13:02 close(14) = 0 6700 14:13:02 writev(25, [{"\25\3\3\0\32\0\0\0\0\0\0\1\20\3535\fh\5\324bZ\211U\203AX.\37\205`m", 31}], 1) = 31 6700 14:13:02 recvfrom(25, "\25\3\3\0\32", 5, 0, NULL, NULL) = 5 6700 14:13:02 recvfrom(25, "\307y\343\27z#\31\373\232\333\371\263\275R\203\31\332?)>\201\245\301z\335;", 26, 0, NULL, NULL) = 26 6700 14:13:02 close(25) = 0 6700 14:13:02 writev(24, [{"\25\3\3\0000\26f\310\217\23\247d?\4\3\301\252\277+\207\250q[9L\306\274\2414}\306\237\253X\221\35\27:\376\206\203\36PP\177\363\362I\321\223\372\203W", 53}], 1) = 53 6700 14:13:02 recvfrom(24, <unfinished ...> 6696 14:13:03 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 6696 14:13:03 tgkill(6696, 6700, SIGTTIN) = 0 6700 14:13:03 <... recvfrom resumed> 0x7fd31449c17b, 5, 0, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) 6700 14:13:03 --- SIGTTIN {si_signo=SIGTTIN, si_code=SI_TKILL, si_pid=6696, si_uid=104} --- 6700 14:13:03 rt_sigreturn({mask=~[KILL STOP TTIN RTMIN RT_1]}) = -1 EINTR (Interrupted system call) 6700 14:13:03 recvfrom(24, <unfinished ...> 6696 14:13:03 futex(0x1a34fe8, FUTEX_WAKE_PRIVATE, 1) = 0 6696 14:13:03 tgkill(6696, 6700, SIGTTIN) = 0 6700 14:13:03 <... recvfrom resumed> 0x7fd31449c17b, 5, 0, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) 6700 14:13:03 --- SIGTTIN {si_signo=SIGTTIN, si_code=SI_TKILL, si_pid=6696, si_uid=104} --- 6700 14:13:03 rt_sigreturn({mask=~[KILL STOP TTIN RTMIN RT_1]}) = -1 EINTR (Interrupted system call) 6700 14:13:03 recvfrom(24, <unfinished ...> 6696 14:13:03 select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout.) 6696 14:13:03 select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout.) 6696 14:13:03 select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout.) 6696 14:13:03 select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout.) . . . and it repeats until 6696 14:14:32 select(0, NULL, NULL, NULL, 0x7ffe5e77b200 <ptrace(SYSCALL):No such process> 6700 14:14:32 +++ killed by SIGKILL +++ 6696 14:14:32 +++ killed by SIGKILL +++ 

事实certificate,其中一个日志logging服务有证书问题。 有意思的是,在启动或操作期间,只有在退出时才会对证书进行投诉。

确切的问题是一个或多个这些:

  • 与另一个服务的证书在同一个文件中
  • 当它最初是一个.PEM文件时被重命名为.CRT文件。
  • 它不喜欢这个文件里相关证书的顺序

所以我不确定是否可以在同一个文件中拥有多个证书,尽pipe这个答案中说了什么。

总而言之,如果您要运送到多个服务,我build议使用多个configuration文件。