无法使用Corosync / Pacemaker启动PostgreSQL复制资源

我正在两台服务器(CentOS 6.5)上通过Corosync / Pacemaker与HAbuild立PostgreSQL复制

我的软件信息:

postgresql91-9.1.19-1PGDG.rhel6.x86_64 postgresql91-server-9.1.19-1PGDG.rhel6.x86_64 postgresql91-libs-9.1.19-1PGDG.rhel6.x86_64 postgresql91-contrib-9.1.19-1PGDG.rhel6.x86_64 postgresql91-devel-9.1.19-1PGDG.rhel6.x86_64 corosynclib-1.4.7-2.el6.x86_64 corosync-1.4.7-2.el6.x86_64 pacemaker-cli-1.1.12-8.el6_7.2.x86_64 pacemaker-1.1.12-8.el6_7.2.x86_64 pacemaker-cluster-libs-1.1.12-8.el6_7.2.x86_64 pacemaker-libs-1.1.12-8.el6_7.2.x86_64 resource-agents-3.9.5-24.el6_7.1.x86_64 

复制正在工作,从主人我可以看到从属服务器连接:

-bash-4.1$ psql -c "select client_addr,sync_state from pg_stat_replication;" client_addr | sync_state -------------+------------ 172.16.1.10 | async (1 row)

而且我也确认在master上创build的数据被复制到slave。

这里是我的crm configure show

 node master node slave primitive PSQL pgsql \ params restart_on_promote=true pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="master slave" repuser=rep rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=172.16.1.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r" primitive RepIP IPaddr2 \ params ip=172.16.1.100 nic=eth2 cidr_netmask=24 \ op monitor interval=30s primitive VirtualIP IPaddr2 \ params ip=10.0.0.100 cidr_netmask=24 \ op monitor interval=30s group psql-ha VirtualIP RepIP \ meta target-role=Started property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore 

但资源PSQL无法启动。 我的crm status

 Last updated: Sat Nov 28 13:09:47 2015 Last change: Sat Nov 28 12:50:21 2015 Stack: classic openais (with plugin) Current DC: master - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ master slave ] Resource Group: psql-ha VirtualIP (ocf::heartbeat:IPaddr2): Started master RepIP (ocf::heartbeat:IPaddr2): Started master Failed actions: PSQL_start_0 on slave 'not configured' (6): call=60, status=complete, last-rc-change='Sat Nov 28 12:50:21 2015', queued=0ms, exec=53ms 

/var/log/messages有一个错误日志:

Nov 28 12:50:21 slave pgsql(PSQL)[3387]:错误:复制(rep_mode = async或sync)需要主/从configuration。

任何人都可以解释为什么我有这个错误?

谢谢。

更新:

(主机名称更改为node1 / node2)

问题通过@gf_的configuration解决。

注意:忘记我的旧configuration,我在这个部署模型中只使用一个虚拟IP。

当前状态:

 [root@node1 ~]# crm_mon -Af -1 Last updated: Wed Dec 2 05:13:56 2015 Last change: Wed Dec 2 05:10:06 2015 Stack: classic openais (with plugin) Current DC: node2 - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ node1 node2 ] VirtualIP (ocf::heartbeat:IPaddr2): Started node2 Master/Slave Set: msPSQL [PSQL] Masters: [ node2 ] Slaves: [ node1 ] Node Attributes: * Node node1: + PSQL-data-status : STREAMING|SYNC + PSQL-status : HS:sync + master-PSQL : 100 * Node node2: + PSQL-data-status : LATEST + PSQL-master-baseline : 000000000E000078 + PSQL-status : PRI + master-PSQL : 1000 Migration summary: * Node node1: * Node node2: 

工作configuration:

 node node1 \ attributes PSQL-data-status="STREAMING|SYNC" node node2 \ attributes PSQL-data-status=LATEST primitive PSQL pgsql \ params restart_on_promote=false pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data" node_list="node1 node2" repuser=replicate rep_mode=sync restore_command="cp /var/lib/pgsql/pg_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=10.0.0.100 archive_cleanup_command="/usr/pgsql-9.1/bin/pg_archivecleanup /var/lib/pgsql/pg_archive/ %r" \ op start timeout=60s interval=0s on-fail=restart \ op monitor timeout=60s interval=4s on-fail=restart \ op monitor timeout=60s interval=3s on-fail=restart role=Master \ op promote timeout=60s interval=0s on-fail=restart \ op demote timeout=60s interval=0s on-fail=stop \ op stop timeout=60s interval=0s on-fail=block \ op notify timeout=60s interval=0s primitive VirtualIP IPaddr2 \ params ip=10.0.0.100 nic=eth1 cidr_netmask=24 \ op monitor interval=30s ms msPSQL PSQL \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 target-role=Started notify=true colocation rsc_colocation-1 inf: VirtualIP msPSQL:Master order rsc_order-1 0: msPSQL:promote VirtualIP:start symmetrical=false order rsc_order-2 0: msPSQL:promote VirtualIP:stop symmetrical=false property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=false \ last-lrm-refresh=1449033003 rsc_defaults rsc-options: \ resource-stickiness=100 

  • 同时, PSQL应该在你的两个节点, masterslave 。 (请注意:不确定这些条款是否适合在您的设置中select节点名称。)

  • 所以,你必须在你的configuration中反映这个,你得到的错误是非常清楚的,并且描述了什么是缺less的:你必须把你的PSQLconfiguration成一个克隆(应该同时在多个节点上运行),多状态(应运行在主从设置)资源。 如果您不知道这是什么,现在应该查看文档,特别是克隆 – 在多主机和多状态下 激活的 资源 – 具有多种模式的资源 。

  • 所以,你的扩展configuration可能是这样的:

     ms msPSQL PSQL \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" 
  • 此外,您必须指定您的VirtualIPRepIP 应运行在哪个节点上,并且必须确保资源按正确的顺序停止/启动:

     colocation rsc_colocation-1 inf: psql-ha msPSQL:Master order rsc_order-1 0: msPSQL:promote psql-ha:start symmetrical=false order rsc_order-2 0: msPSQL:demote psql-ha:stop symmetrical=false