我有一个有三个Redis实例(一个主和两个从属)和三个Sentinel实例的体系结构。 在它的前面有一个HaProxy。 一切正常,直到主Redis实例出现故障。 新的主人是由哨兵正确select。 然而,老主人(现在已经倒下)没有重新configuration成从属人。 结果,当这个事件再次发生时,我有两个主人在短时间内(约11秒)。 那段时间之后,那个被提升的例子被正确降级为奴隶。
它不应该这样工作,当主人倒下时,它立即降级到奴隶? 有了它,当它再次起来,它会立即成为奴隶。 我知道(自Redis 2.8以来)有CONFIG REWRITEfunction,所以在Redis实例closures时configuration不能被修改。
有两个主人一段时间对我来说是一个问题,因为HaProxy在短时间内不向一个主Redis发送请求,而是在这两个主人之间进行负载平衡。
有没有办法立即将失败的主机降级到奴隶?
显然,我改变了哨兵超时。
以下是主控制器closures后来自Sentinel和Redis实例的一些日志:
哨兵
81358:X 23 Jan 22:12:03.088 # +sdown master redis-ha 127.0.0.1 63797.0.0.1 26381 @ redis-ha 127.0.0.1 6379 81358:X 23 Jan 22:12:03.149 # +new-epoch 1 81358:X 23 Jan 22:12:03.149 # +vote-for-leader 6b5b5882443a1d738ab6849ecf4bc6b9b32ec142 1 81358:X 23 Jan 22:12:03.174 # +odown master redis-ha 127.0.0.1 6379 #quorum 3/2 81358:X 23 Jan 22:12:03.174 # Next failover delay: I will not start a failover before Sat Jan 23 22:12:09 2016 81358:X 23 Jan 22:12:04.265 # +config-update-from sentinel 127.0.0.1:26381 127.0.0.1 26381 @ redis-ha 127.0.0.1 6379 81358:X 23 Jan 22:12:04.265 # +switch-master redis-ha 127.0.0.1 6379 127.0.0.1 6381 81358:X 23 Jan 22:12:04.266 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ redis-ha 127.0.0.1 6381 81358:X 23 Jan 22:12:04.266 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ redis-ha 127.0.0.1 6381 81358:X 23 Jan 22:12:06.297 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ redis-ha 127.0.0.1 6381
Redis的
81354:S 23 Jan 22:12:03.341 * MASTER <-> SLAVE sync started 81354:S 23 Jan 22:12:03.341 # Error condition on socket for SYNC: Connection refused 81354:S 23 Jan 22:12:04.265 * Discarding previously cached master state. 81354:S 23 Jan 22:12:04.265 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=7 addr=127.0.0.1:57784 fd=10 name=sentinel-6b5b5882-cmd age=425 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=14 qbuf-free=32754 obl=36 oll=0 omem=0 events=rw cmd=exec') 81354:S 23 Jan 22:12:04.265 # CONFIG REWRITE executed with success. 81354:S 23 Jan 22:12:04.371 * Connecting to MASTER 127.0.0.1:6381 81354:S 23 Jan 22:12:04.371 * MASTER <-> SLAVE sync started 81354:S 23 Jan 22:12:04.371 * Non blocking connect for SYNC fired the event. 81354:S 23 Jan 22:12:04.371 * Master replied to PING, replication can continue... 81354:S 23 Jan 22:12:04.371 * Partial resynchronization not possible (no cached master) 81354:S 23 Jan 22:12:04.372 * Full resync from master: 07b3c8f64bbb9076d7e97799a53b8b290ecf470b:1 81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: receiving 860 bytes from master 81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Flushing old data 81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Loading DB in memory 81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Finished with success