Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

MySQL replication not working - but in SHOW SLAVE STATUS everything is OK
Friday - Jan 16th 2015 - by - (0 comments)

A strange problem has hit me recently where a MySQL replication on Solaris zones failed and the slave did not get any new log files from the master anymore.

The slave is of course being monitored (with the Nagios plugin check_mysql_slavestatus.sh) but everything was always OK... until it suddenly became CRITICAL because of the following error:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

What happened? It seems that for a couple of days, the replication silently failed and the master and slave didn't communicate correctly with each other anymore. While the master continued to update its binary log files, the slave did not retrieve the changed binary logs from the master. However there was no error indicated in the SHOW SLAVE STATUS output:

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.17.20.100
                  Master_User: replica
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin.000408
          Read_Master_Log_Pos: 24547311
               Relay_Log_File: relay-log.000330
                Relay_Log_Pos: 4
        Relay_Master_Log_File: bin.000408
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 24547311
              Relay_Log_Space: 120
[...]
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
[...]
1 row in set (0.00 sec)

check_mysql_slavestatus reads all these values, and because everything seems to be OK according to the 'show slave status' output, no issues were found.

But the non-working synchronisation could easily be checked, by doing a simple write operation on the master and check the result on the slave. Here I create a new database on the master and then check for it appearing on the slave:

#On MASTER:
mysql> create database claudiotest;
Query OK, 1 row affected (0.02 sec)

mysql> show master status;
+------------+----------+--------------+------------------+-------------------+
| File       | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------+----------+--------------+------------------+-------------------+
| bin.000408 | 47461189 |              |                  |                   |
+------------+----------+--------------+------------------+-------------------+

#On SLAVE nothing was done
[root@slave ~]# ll /var/lib/mysql/ | grep claudio
[root@slave ~]# mysql -e "show databases" | grep claudio

#... and nothing moved either!
mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.17.20.100
                  Master_User: replica
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin.000408
          Read_Master_Log_Pos: 24547311
               Relay_Log_File: relay-log.000331
                Relay_Log_Pos: 4
        Relay_Master_Log_File: bin.000408
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]

So although everything seems to be in order according to the slave status output, nothing was actually done. The slave didn't even get the relevant information from the master, that the master log file position has changed.
This particular MySQL (5.6) replication runs on two virtual Solaris servers (zones), each with two virtual nics. The replication happens over the secondary interface (backend). Now I strongly suspect a networking issue/bug of some kind of the operating system, although telnet and ping show a correct communication between master and slave. A restart of the MySQL server on the slave didn't help either.

I finally got the replication working again, by using the primary network interface of the zone.
To catch such replication/connectivity issues, I have modified check_mysql_slavestatus with a new check type. The change will be published soon.

 

Add a comment

Show form to leave a comment

Comments (newest first):

No comments yet.

Go to Homepage home
Linux Howtos how to's
Nagios Plugins nagios plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7668 Days
until Death of Computers
Why?