修复MySQL group_replication applier模块启动失败

TL / DR:这个问题不是关于主从复制。 这是关于多主群组复制的新function(可从v5.7.17开始)。 我们有解决方法,但由于大量数据需要导入两次,所以需要花费数小时,所以问题在于find正确的方法来解决这个问题。

我们已经使用group replication设置了生产MySQL multi-master集群(仅提供5.7.17和更高版本)。

但是,当向此群集添加新节点时,出现ERROR 3094 (HY000): The START GROUP_REPLICATION command failed as the applier module failed to start.

尝试使用在设置传统主从复制时使用的类似错误的解决scheme[1] [2] 不起作用 。 我发现唯一的工作解决scheme是运行reset + source命令TWICE。 这需要花费数小时才能完成(数据库非常大)。 见下面的细节。

我们添加新节点的过程如下:

 On existing node: # mysqldump --all-databases --triggers --routines --events -u root -p > /home/user/dump.sql Copy dump to new node Setup new node completely clean with group replication config (see config example below) mysql> SET SQL_LOG_BIN=0; mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'xxxxxxxx' REQUIRE SSL; mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%'; mysql> FLUSH PRIVILEGES; mysql> CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='xxxxxxxx' FOR CHANNEL 'group_replication_recovery'; mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so'; mysql> SET SQL_LOG_BIN=1; mysql> RESET MASTER; #This is required because the server on first start will create new GTID which does not exist on other nodes mysql> SOURCE /home/user/dump.sql; mysql> START GROUP_REPLICATION; 

在错误日志中,我可以看到:

 2017-07-19T04:00:24.452539Z 7 [ERROR] Failed to open the relay log './hostname1-relay-bin-group_replication_applier.000001' (relay_log_pos 4). 2017-07-19T04:00:24.452549Z 7 [ERROR] Could not find target log file mentioned in relay log info in the index file './hostname1-relay-bin-group_replication_applier.index' during relay log initialization. 2017-07-19T04:00:24.454935Z 7 [ERROR] Plugin group_replication reported: 'Failed to setup the group replication applier thread.' 2017-07-19T04:00:24.454957Z 7 [Note] Plugin group_replication reported: 'The group replication applier thread was killed' 2017-07-19T04:00:24.454986Z 4 [ERROR] Plugin group_replication reported: 'Unable to initialize the Group Replication applier module.' 2017-07-19T04:00:24.455347Z 4 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member' 2017-07-19T04:00:24.455361Z 4 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.' 2017-07-19T04:00:24.455372Z 0 [Note] Plugin group_replication reported: 'Destroying SSL' 2017-07-19T04:00:24.455380Z 0 [Note] Plugin group_replication reported: 'Success destroying SSL' 

configuration示例:

 [mysqld] socket = /var/lib/mysql/mysql.sock pid-file = /var/run/mysqld/mysqld.pid log-error = /var/log/mysqld/error.log symbolic-links = 0 skip_name_resolve ############# GROUP REPLICATION ############### # General replication settings gtid_mode = ON enforce_gtid_consistency = ON master_info_repository = TABLE relay_log_info_repository = TABLE binlog_checksum = NONE log_slave_updates = ON log_bin = mysql-bin relay_log = relay-bin binlog_format = ROW transaction_write_set_extraction = XXHASH64 loose-group_replication_bootstrap_group = OFF loose-group_replication_start_on_boot = ON loose-group_replication_ssl_mode = REQUIRED loose-group_replication_recovery_use_ssl = 1 loose-group_replication_single_primary_mode = OFF loose-group_replication_enforce_update_everywhere_checks = ON group_replication_auto_increment_increment = 3 auto_increment_increment = 3 # Shared replication group configuration loose-group_replication_group_name = "84c75469-8959-4d4a-beb5-7753684a5161" loose-group_replication_ip_whitelist = "192.168.25.0/24,127.0.0.0/8" loose-group_replication_group_seeds = "192.168.25.2:13306,192.168.25.3:13306,192.168.25.4:13306" # Host specific replication configuration server_id = 10 bind-address = * report_host = "192.168.25.5" loose-group_replication_local_address = "192.168.25.5:13306" ########## END GROUP REPLICATION ##########