使用起搏器和corosync监测系统资源时，克隆时会返回“未运行”

设置：操作系统：CentOS 7，最新版本Corosync，Pacemaker＆PCS – 两个节点主动/主动群集，虚拟IP – 两个节点上Exim运行在远程邮件（SMTP），没有什么特别的configuration – 当Exim在其中一个节点上失败，节点不应该参与回复虚拟IP，直到Exim恢复运行

我试图得到这个工作： – 克隆ocf：心跳：虚拟IP的IPaddr2资源 – 克隆systemd：Exim资源观看Exim与on-fail =“待机”选项

问题：最初，一切工作都应该如此。当其中一个节点无法运行Exim时，它会正确停止，并且该节点不再参与虚拟IP。问题是，在停止和启动其中一个节点之后，Exim重新启动（因为它应该），但监视器返回“不运行”。当Exim-resource没有configurationon-fail =“standby”时，一切都按照devise工作，我可以按照自己的想法启动/停止Exim和其中一个节点。

消息在日志中：

Jan 28 16:17:30 testvm101 crmd[14183]: notice: process_lrm_event: LRM operation exim:0_monitor_30000 (call=141, rc=7, cib-update=211, confirmed=false) not running Jan 28 16:17:30 testvm101 crmd[14183]: warning: status_from_rc: Action 20 (exim:0_monitor_30000) on testvm101 failed (target: 0 vs. rc: 7): Error Jan 28 16:17:30 testvm101 crmd[14183]: warning: update_failcount: Updating failcount for exim:0 on testvm101 after failed monitor: rc=7 (update=value++, time=1422458250)

pcs状态输出：

 [root@testvm101 ~]# pcs status Cluster name: smtp_cluster Last updated: Wed Jan 28 16:31:44 2015 Last change: Wed Jan 28 16:17:13 2015 via cibadmin on testvm101 Stack: corosync Current DC: testvm101 (1) - partition with quorum Version: 1.1.10-32.el7_0.1-368c726 2 Nodes configured 4 Resources configured Node testvm101 (1): standby (on-fail) Online: [ testvm102 ] Full list of resources: Clone Set: virtual_ip-clone [virtual_ip] (unique) virtual_ip:0 (ocf::heartbeat:IPaddr2): Started testvm102 virtual_ip:1 (ocf::heartbeat:IPaddr2): Started testvm102 Clone Set: exim-clone [exim] (unique) exim:0 (systemd:exim): Started testvm102 exim:1 (systemd:exim): Started testvm102 Failed actions: exim:0_monitor_30000 on testvm101 'not running' (7): call=141, status=complete, last-rc-change='Wed Jan 28 16:17:30 2015', queued=6ms, exec=15002ms

据我所知，在这个消息的时候，Exim正在运行并为systemd工作。我已经试图指定启动延迟选项，希望这会有所作为（但事实并非如此）。

运行时： pcs resource cleanup exim-clone清除失败计数，一切正常，直到monitor-action第一次出现，然后标记为待机的节点被另一个交换…

示例：节点testvm102上的Exim监视器失败后的状态：

 [root@testvm101 ~]# pcs status ... Node testvm102 (2): standby (on-fail) Online: [ testvm101 ] Full list of resources: Clone Set: virtual_ip-clone [virtual_ip] (unique) virtual_ip:0 (ocf::heartbeat:IPaddr2): Started testvm101 virtual_ip:1 (ocf::heartbeat:IPaddr2): Started testvm101 Clone Set: exim-clone [exim] (unique) exim:0 (systemd:exim): Started testvm101 exim:1 (systemd:exim): Started testvm101 Failed actions: exim:0_monitor_30000 on testvm102 'not running' (7): call=150, status=complete, last-rc-change='Wed Jan 28 16:33:59 2015', queued=5ms, exec=15004ms

我正在为exim-resource运行资源清理来重置失败计数：

 [root@testvm101 ~]# pcs resource cleanup exim-clone Resource: exim-clone successfully cleaned up

经过一段时间后，状态看起来很好（实际上也很好）：

 [root@testvm101 ~]# pcs status ... Online: [ testvm101 testvm102 ] Full list of resources: Clone Set: virtual_ip-clone [virtual_ip] (unique) virtual_ip:0 (ocf::heartbeat:IPaddr2): Started testvm101 virtual_ip:1 (ocf::heartbeat:IPaddr2): Started testvm102 Clone Set: exim-clone [exim] (unique) exim:0 (systemd:exim): Started testvm101 exim:1 (systemd:exim): Started testvm102

下次执行监视操作时，检查在另一个节点上失败：

 [root@testvm101 ~]# pcs status ... Node testvm101 (1): standby (on-fail) Online: [ testvm102 ] Full list of resources: Clone Set: virtual_ip-clone [virtual_ip] (unique) virtual_ip:0 (ocf::heartbeat:IPaddr2): Started testvm102 virtual_ip:1 (ocf::heartbeat:IPaddr2): Started testvm102 Clone Set: exim-clone [exim] (unique) exim:0 (systemd:exim): Started testvm102 exim:1 (systemd:exim): Started testvm102 Failed actions: exim:0_monitor_30000 on testvm101 'not running' (7): call=176, status=complete, last-rc-change='Wed Jan 28 16:37:10 2015', queued=0ms, exec=0ms

也许是我忘记的东西？

感谢帮助