Zookeeper节点不能再加载生产服务器上的数据库

我正在三个不同的Ubuntu 14.04节点上运行一个3的zookeeper合奏。 该设置用于工作正常,但现在我注意到zk1不可操作。 它也不会重新启动:

:/home/www$ sudo /etc/init.d/zookeeper status * zookeeper is not running :/home/www$ ps -ef | grep zoo zookeep+ 11465 1 0 16:15 ? 00:00:00 /usr/bin/java -cp /etc/zookeeper/conf:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/netty.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-log4j12.jar:/usr/share/java/zookeeper.jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.root.logger=INFO,ROLLINGFILE org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zokeeper/conf/zoo.cfg merlin 11492 25021 0 16:15 pts/1 00:00:00 grep --color=auto zoo :/home/www$ echo stat | nc zk1 2181 :/home/www$ echo stat | nc zk2 2181 Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT Clients: /10.0.0.103:42841[1](queued=0,recved=33936,sent=33950) /10.0.0.101:38370[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/14 Received: 37987 Sent: 38069 Connections: 2 Outstanding: 0 Zxid: 0x1600000983 Mode: follower Node count: 202 

启动脚本不会重新启动动物园,也不会阻止它。 状态声称它没有运行。 然而,zk2和zk3正在运行,但他们的启动脚本声称zookeeper没有运行,但它是。

这是一个生产服务器,迄今为止solr是可操作的,但我希望能尽快解决这个任何帮助,非常感谢。

这里是日志文件,运行一个疯狂的永久性错误。 事情似乎是错误的数据库。 我必须删除solr的日志条目,这可能是它的原因。

Zookeeper日志文件:

 2016-01-15 16:30:02,618 - INFO [main:QuorumPeerConfig@101] - Reading configuration from: /etc/zookeeper/conf/zoo.cfg 2016-01-15 16:30:02,620 - INFO [main:QuorumPeerConfig@334] - Defaulting to majority quorums 2016-01-15 16:30:02,622 - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 2016-01-15 16:30:02,622 - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0 2016-01-15 16:30:02,623 - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled. 2016-01-15 16:30:02,628 - INFO [main:QuorumPeerMain@127] - Starting quorum peer 2016-01-15 16:30:02,635 - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181 2016-01-15 16:30:02,644 - INFO [main:QuorumPeer@913] - tickTime set to 2000 2016-01-15 16:30:02,644 - INFO [main:QuorumPeer@933] - minSessionTimeout set to -1 2016-01-15 16:30:02,644 - INFO [main:QuorumPeer@944] - maxSessionTimeout set to -1 2016-01-15 16:30:02,644 - INFO [main:QuorumPeer@959] - initLimit set to 10 2016-01-15 16:30:02,663 - INFO [main:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/snapshot.1400009ce6 2016-01-15 16:30:02,677 - ERROR [main:Util@239] - Last transaction was partial. 2016-01-15 16:30:02,677 - ERROR [main:QuorumPeer@453] - Unable to load database on disk java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2016-01-15 16:30:02,678 - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) ... 4 more 

我现在删除了快照,并可以在此节点上重新启动zookeeper,但它不会连接到整体,而是只显示一个节点:

 echo stat | nc zk1 2181 Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT Clients: /10.0.0.101:57508[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 1 Sent: 0 Connections: 1 Outstanding: 0 Zxid: 0x16000009a4 Mode: follower Node count: 202 

Last transaction was partial. 2016-01-15 16:30:02,677 - ERROR [main:QuorumPeer@453] - Unable to load database on disk

我有同样的问题,它主要发生在磁盘空间已满或zookeeper无法写入磁盘。 解决此问题的一种方法是清除zookeeper日志( /<base_path>/zookeeper/version-2/ )。