MySQL Group Replication Got fatal error 1236

问题产生的场景

原先配置了MGR集群,从节点退出集群(stop group_replication),隔了很长一段时间想要加进来,这段时间内,主节点进行很多操作,并且之前的binlog被purged掉

问题产生原因分析

MGR主节点的binary log被删除掉(purge或者定期删除掉,人为地在硬盘上执行rm删除不在此考虑范围), MGR从节点无法读取到这些binary log, 尝试多次recovery(找donor进行recovery)后失败,从replication group中退出,从节点的报错信息大致如下所示:

1
2
3
2017-03-13T10:00:26.358681+08:00 355 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2017-03-13T10:00:26.358720+08:00 355 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
...

解决思路

从MGR主节点利用mysqldump dump出数据,包括gtid_purged,在从节点上应用

解决步骤

1
mysqldump --all-databases --set-gtid-purged=ON --single-transaction -uroot -S"/mysql_data/data/mgrtest1/mysql_mgrtest1.sock" -P24801 -p > /mysql_data/data/mgrtest1_alldb.sql

输入密码回车后,提示:

1
2
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events.
mysqldump: Couldn't execute 'SAVEPOINT sp': The MySQL server is running with the --transaction-write-set-extraction!=OFF option so it cannot execute this statement (1290)

关联到mysql bug: https://bugs.mysql.com/bug.php?id=81494

根据提示信息,当节点运行在group replication模式下,不支持savepoint,而mysqldump中需要dump出这条语句,然后dump出set @@gitd_purged=XXX 这个语句

根据mysql bug链接提供的解决步骤,在主节点上执行:

1
2
mysql> set global transaction_write_set_extraction=OFF;
ERROR 3093 (HY000): The write set algorithm cannot be changed when Group replication is running.

主节点需要关闭group_replication(此时如果有其他ONLINE从节点,选择其他ONLINE节点进行mysqldump应该是更好的选择)

若stop的是主节点,请确保其他ONLINE从节点是否需要提前从集群中解除,如果需要,先stop所有从节点,再stop主节点

1
2
3
4
5
mysql> stop group_replication;
Query OK, 0 rows affected (8.74 sec)
mysql> set global transaction_write_set_extraction=OFF;
Query OK, 0 rows affected (0.00 sec)

再次执行mysqldump命令即可成功dump出数据

将dump出来的文件拷贝到从节点所在机器,登陆从节点,执行:

1
2
reset master;
source ${your_sql_file}

恢复主节点:

1
2
3
4
5
6
7
8
9
10
11
mysql> set global transaction_write_set_extraction=XXHASH64;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.01 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (1.06 sec)
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)

最后,在从节点上执行:

1
start group_replication;

到这里从节点应该能够正常加入了

参考

引用 http://lefred.be/content/mysql-group-replication-limitations-savepoints/,评论部分如下描述

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Savepoints are also used when executing mysqldump with –single-transaction option.
http://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_single-transaction
That means if you want to do a consistent non blocking mysqldump on a node of the group you need to put the node out of the cluster first.
e.g.
mysql> — Put the node out of the Group Replication cluster
mysql> STOP group_replication;
mysql> SET GLOBAL transaction_write_set_extraction=OFF;
$ # Dump the entire node (instance)
$ mysqldump –all-databases –triggers –routines –events –single-transaction > /mysqldump/dump_YYYYMMDD.sql
mysql> — Bring back the node into the cluster
mysql> SET GLOBAL transaction_write_set_extraction=XXHASH64;
mysql> START group_replication;
坚持原创技术分享,您的支持将鼓励我继续创作!