MySQL replication out of sync: How to do a full database restore and solve error 1032 (could not execute update_rows)

Written by - 0 comments

Published on April 23rd 2021 - Listed in MySQL Database DB


After data corruption happened on a MySQL replication replica (slave) server, the databases needed to be fully synced again. In order to do this, the following steps are required:

  1. Lock the source (master) database server (applications writing to this MySQL server will fail during the lock)
  2. Retrieve the current source (master) status information
  3. Dump database(s) on (master) database server
  4. Unlock source (master) database server
  5. Transfer dump(s) to replica (slave) server
  6. Stop slave process on replica (slave) server
  7. Import database dump(s) on replica (slave) server
  8. Change replication settings by using the information retrieved in step 2 on replica (slave) server
  9. Start slave process on replica (slave) server

Note: The old replication terms were "master-slave replication", consisting of a master and one or more slave servers. The new terminology uses "source-replica replication", consisting of a source and one or more replica servers.

Monitoring replication status

The information, that the replica was out of sync, was retrieved by our monitoring, using the moniotring plugin check_mysql_slavestatus in the background:

***** Icinga  *****

 Notification Type: PROBLEM

 Service: MySQL Replication Status
 Host: mysql02
 Address: 10.10.50.49
 State: CRITICAL

 Date/Time: 2021-04-22 07:54:34 +0200

 Additional Info: CRITICAL: -h 10.10.50.49:-P 3306 Slave_SQL_Running: No

 Comment: []

After an analysis what happened, it was decided to "clear" the replica and do a full data-sync from the source.

Full replication (re-)sync

Note: A very good guide can be found on Remi Bergsma's article fixing an out-of-sync or corrupt MySQL slave.

Steps to do on the source server

On the source server, the replication state is reset and the tables must be locked to get a consistent state.

mysql> RESET MASTER;
Query OK, 0 rows affected (0.52 sec)

mysql> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 |      154 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

Save the information retrieved from the SHOW MASTER STATUS command. You'll need this later.

Then the database(s) can be saved using mysqldump:

root@mysql01:/backup# mysqldump --max_allowed_packet=512M --all-databases > mysqldump.sql

If a database contains large data fields, the --max_allowed_packet parameter should be used.

The dump should contain the CREATE DATABASE command:

root@mysql01:/backup# more mysqldump.sql
-- MySQL dump 10.13  Distrib 5.7.31, for Linux (x86_64)
--
-- Host: localhost    Database:
-- ------------------------------------------------------
-- Server version       5.7.31-log

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Current Database: `Jira`
--

CREATE DATABASE /*!32312 IF NOT EXISTS*/ `Jira` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin */;

Now that the dump was successfully written, the tables can be unlocked:

mysql> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

Now transfer the dump file to the replica server. If the dump is big, you should consider zipping it first:

root@mysql01:/backup# gzip mysqldump.sql

Steps to do on the replica server

Now with the dump transferred to the replica server, the slave process needs to be stopped and reset:

mysql> STOP SLAVE;
Query OK, 0 rows affected (0.00 sec)

mysql> RESET SLAVE;
Query OK, 0 rows affected (0.00 sec)

To get a clean state and get rid of still remaining data corruption, I recommend to DROP all databases (except mysql internal databases information_schema, mysql, performance_schema and sys).

mysql> DROP DATABASE Jira;
Query OK, 0 rows affected (0.00 sec)

The database dump can now be loaded into the MySQL server:

root@mysql02:/backup# gunzip < mysqldump.sql.gz | mysql --max_allowed_packet=512M -u root

Once the dump was restored (without error), the replication settings need to be adjusted. Use the information retrieved from the master earlier:

mysql> CHANGE MASTER TO master_host='mysql01', master_user='repl', master_password='secret', master_log_file='mysql-bin.000001', master_log_pos=154;
Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)

In most cases, the replication should catch up since the master's database was unlocked.

Error 1032 Could not execute Update_rows event on table

In some cases the replication can have a couple of hiccups. In this case an error 1032 showed up in the SHOW SLAVE STATUS command:

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Queueing master event to the relay log
                  Master_Host: mysql01
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 40254923
               Relay_Log_File: mysql02-relay-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Could not execute Update_rows event on table monitoring.monitoring; Can't find record in 'monitoring', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.000001, end_log_pos 441

                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 40255304
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Update_rows event on table monitoring.monitoring; Can't find record in 'monitoring', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.000001, end_log_pos 441
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: e1e2a203-eb81-11ea-8f82-0050568d9dca
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 210423 07:27:58
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

If you search for that particular error, most results will simply tell you to skip the error and start the slave again. Although this is technically correct, you should first make sure that you understand the SQL error. In this case the error is caused by yet another monitoring plugin check_mysql_write, which updates a record in the monitoring table in the monitoring database every 10 seconds. 

As this is only monitoring related and temporary data, this error can be skipped and the slave process started again:

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)

But as I mentioned, check_mysql_write updates the table every 10 seconds, so there are a ton of these errors to skip. This can be solved doing a loop, by making sure the error is related to the monitoring database:

root@mysql02:~# while true; do if [[ $(mysql -e "SHOW SLAVE STATUS\G" | grep "Last_SQL_Error:" | grep -c "monitoring") -gt 0 ]]; then mysql -e "STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;"; else break; fi; sleep 1; done

This took several seconds until finally all errors related to the monitoring database were skipped and finally the replication was running again.

***** Icinga  *****

Notification Type: RECOVERY

Service: MySQL Replication Status
Host: mysql02
Address: 10.10.50.49
State: OK

Date/Time: 2021-04-23 07:37:48 +0200

Additional Info: OK: Slave SQL running: Yes Slave IO running: Yes / master: mysql01 / slave is 0 seconds behind master

Comment: [] 


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.