Galera cluster unable to sync SST: xtrabackup checkpoints missing, failed innobackupex on donor

Written by - 0 comments

Published on September 23rd 2019 - Listed in MariaDB DB Databases


In a post a couple of weeks ago, we wrote about using mariabackup as SST method and compared it to the rsync method. Since the initial tests were all successful, we rolled out mariabackup as sst method on all Galera nodes across all environments. 

Uh, oh, Galera node can't join!

But today a problem in a 2 node Galera (test-) cluster was discovered where one node (node02) failed and was unable to sync SST from the donor (node01):

Sep 23 06:52:45 node02 systemd[1]: Starting MariaDB 10.2.26 database server...
Sep 23 06:52:47 node02 mysqld[30553]: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Sep 23 06:52:47 node02 mysqld[30811]: 2019-09-23  6:52:47 139873711629632 [Note] /usr/sbin/mysqld (mysqld 10.2.26-MariaDB-1:10.2.26+maria~bionic-log) starting as process 30811 ...
Sep 23 06:52:49 node02 -wsrep-sst-joiner: Streaming with xbstream
Sep 23 06:52:49 node02 -wsrep-sst-joiner: Using socat as streamer
Sep 23 06:52:49 node02 -wsrep-sst-joiner: Stale sst_in_progress file: /var/lib/mysql//sst_in_progress
Sep 23 06:52:49 node02 -wsrep-sst-joiner: Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} )
Sep 23 06:52:49 node02 -wsrep-sst-joiner: WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing
Sep 23 06:52:49 node02 -wsrep-sst-joiner: Proceeding with SST
Sep 23 06:52:49 node02 -wsrep-sst-joiner: Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} )
Sep 23 06:52:49 node02 -wsrep-sst-joiner: Cleaning the existing datadir and innodb-data/log directories
Sep 23 06:52:49 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/ib_logfile1'
Sep 23 06:52:49 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/aria_log_control'
Sep 23 06:52:49 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/aria_log.00000001'
Sep 23 06:52:49 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/ibdata1'
Sep 23 06:52:49 node02 -wsrep-sst-joiner: Cleaning the binlog directory /var/log/mysql as well
Sep 23 06:52:49 node02 -wsrep-sst-joiner: removed '/var/log/mysql/mariadb-bin.000001'
Sep 23 06:52:49 node02 -wsrep-sst-joiner: Waiting for SST streaming to complete!
Sep 23 06:52:59 node02 -wsrep-sst-joiner: xtrabackup_checkpoints missing, failed innobackupex/SST on donor
Sep 23 06:52:59 node02 -wsrep-sst-joiner: Cleanup after exit with status:2
Sep 23 06:53:05 node02 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
Sep 23 06:53:05 node02 systemd[1]: mariadb.service: Failed with result 'signal'.
Sep 23 06:53:05 node02 systemd[1]: Failed to start MariaDB 10.2.26 database server.
Sep 23 06:53:10 node02 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.
Sep 23 06:53:10 node02 systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 66.
Sep 23 06:53:10 node02 systemd[1]: Stopped MariaDB 10.2.26 database server.
Sep 23 06:53:10 node02 systemd[1]: Failed to set devices.allow on /system.slice/mariadb.service: Operation not permitted
Sep 23 06:53:10 node02 systemd[1]: Failed to set devices.allow on /system.slice/mariadb.service: Operation not permitted
Sep 23 06:53:10 node02 systemd[1]: Starting MariaDB 10.2.26 database server...
Sep 23 06:53:12 node02 mysqld[31447]: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Sep 23 06:53:12 node02 mysqld[31707]: 2019-09-23  6:53:12 140307056854336 [Note] /usr/sbin/mysqld (mysqld 10.2.26-MariaDB-1:10.2.26+maria~bionic-log) starting as process 31707 ...
Sep 23 06:53:13 node02 -wsrep-sst-joiner: Streaming with xbstream
Sep 23 06:53:13 node02 -wsrep-sst-joiner: Using socat as streamer
Sep 23 06:53:13 node02 -wsrep-sst-joiner: Stale sst_in_progress file: /var/lib/mysql//sst_in_progress
Sep 23 06:53:13 node02 -wsrep-sst-joiner: Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} )
Sep 23 06:53:13 node02 -wsrep-sst-joiner: WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing
Sep 23 06:53:13 node02 -wsrep-sst-joiner: Proceeding with SST
Sep 23 06:53:13 node02 -wsrep-sst-joiner: Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} )
Sep 23 06:53:13 node02 -wsrep-sst-joiner: Cleaning the existing datadir and innodb-data/log directories
Sep 23 06:53:13 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/ib_logfile1'
Sep 23 06:53:13 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/aria_log_control'
Sep 23 06:53:13 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/aria_log.00000001'
Sep 23 06:53:13 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/ibdata1'
Sep 23 06:53:13 node02 -wsrep-sst-joiner: Cleaning the binlog directory /var/log/mysql as well
Sep 23 06:53:13 node02 -wsrep-sst-joiner: removed '/var/log/mysql/mariadb-bin.000001'
Sep 23 06:53:13 node02 -wsrep-sst-joiner: Waiting for SST streaming to complete!
Sep 23 06:53:23 node02 -wsrep-sst-joiner: xtrabackup_checkpoints missing, failed innobackupex/SST on donor
Sep 23 06:53:23 node02 -wsrep-sst-joiner: Cleanup after exit with status:2
Sep 23 06:53:29 node02 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
Sep 23 06:53:29 node02 systemd[1]: mariadb.service: Failed with result 'signal'.
Sep 23 06:53:29 node02 systemd[1]: Failed to start MariaDB 10.2.26 database server.

Unfortunately from these logs the reason for the failure cannot be found. However one line blames the donor:

Sep 23 06:53:23 node02 -wsrep-sst-joiner: xtrabackup_checkpoints missing, failed innobackupex/SST on donor

Not enough privileges on the donor

Checking the logs on the donor does indeed give more information:

Sep 23 06:53:18 node01 -wsrep-sst-donor: Streaming with xbstream
Sep 23 06:53:18 node01 -wsrep-sst-donor: Using socat as streamer
Sep 23 06:53:18 node01 -wsrep-sst-donor: Using /tmp/tmp.Kmg5CUC46B as innobackupex temporary directory
Sep 23 06:53:18 node01 -wsrep-sst-donor: Streaming GTID file before SST
Sep 23 06:53:18 node01 -wsrep-sst-donor: Evaluating mbstream -c ${INFO_FILE} | socat -u stdio TCP:192.168.253.82:4444; RC=( ${PIPESTATUS[@]} )
Sep 23 06:53:18 node01 -wsrep-sst-donor: Sleeping before data transfer for SST
Sep 23 06:53:27 node01 [audit admck/21139 as root/21309 on pts/4/] /var/lib/mysql: tail -f /var/log/syslog
Sep 23 06:53:28 node01 -wsrep-sst-donor: Streaming the backup to joiner at 192.168.253.82 4444
Sep 23 06:53:28 node01 -wsrep-sst-donor: Evaluating /usr//bin/mariabackup --innobackupex      $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2> >(logger -p daemon.err -t -innobackupex-backup) | socat -u stdio TCP:192.168.253.82:4444; RC=( ${PIPESTATUS[@]} )
Sep 23 06:53:28 node01 -innobackupex-backup: 190923 06:53:28 innobackupex: Starting the backup operation
Sep 23 06:53:28 node01 -innobackupex-backup:
Sep 23 06:53:28 node01 -innobackupex-backup: IMPORTANT: Please check that the backup run completes successfully.
Sep 23 06:53:28 node01 -innobackupex-backup:            At the end of a successful backup run innobackupex
Sep 23 06:53:28 node01 -innobackupex-backup:            prints "completed OK!".
Sep 23 06:53:28 node01 -innobackupex-backup:
Sep 23 06:53:28 node01 -innobackupex-backup: [00] 2019-09-23 06:53:28 Connecting to MySQL server host: localhost, user: sstuser, password: set, port: 3306, socket: /var/run/mysqld/mysqld.sock
Sep 23 06:53:28 node01 -innobackupex-backup: [00] 2019-09-23 06:53:28 Using server version 10.2.26-MariaDB-1:10.2.26+maria~bionic-log
Sep 23 06:53:28 node01 -innobackupex-backup: [00] 2019-09-23 06:53:28 Error: missing required privilege PROCESS on *.*
Sep 23 06:53:28 node01 -innobackupex-backup: [00] 2019-09-23 06:53:28 Current privileges, as reported by 'SHOW GRANTS':
Sep 23 06:53:28 node01 -innobackupex-backup: [00] 2019-09-23 06:53:28   1.GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY PASSWORD 'hidden'
Sep 23 06:53:28 node01 -innobackupex-backup: [00] FATAL ERROR: 2019-09-23 06:53:28 Insufficient privileges
Sep 23 06:53:28 node01 -wsrep-sst-donor: /usr//bin/mariabackup finished with error: 1.  Check syslog or /var/lib/mysql//mariabackup.backup.log for details
Sep 23 06:53:28 node01 -wsrep-sst-donor: Cleanup after exit with status:22
Sep 23 06:53:28 node01 -wsrep-sst-donor: Cleaning up temporary directories

The error points it out quite nicely: There are not enough privileges granted to the "sstuser" database user. The "PROCESS" command needs to be granted, too.

MariaDB [(none)]> show grants for 'sstuser'@'localhost';
+---------------------------------------------------------------------------------------------------------------+
| Grants for sstuser@localhost                                                                                  |
+---------------------------------------------------------------------------------------------------------------+
| GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY PASSWORD 'hidden' |
+---------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)


MariaDB [(none)]> GRANT RELOAD, PROCESS, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY PASSWORD 'hidden';
Query OK, 0 rows affected (0.03 sec)



MariaDB [(none)]> show grants for 'sstuser'@'localhost';
+------------------------------------------------------------------------------------------------------------------------+
| Grants for sstuser@localhost                                                                                           |
+------------------------------------------------------------------------------------------------------------------------+
| GRANT RELOAD, PROCESS, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY PASSWORD 'hidden' |
+------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Now that the privileges are corrected, the sync was initiated again by starting MariaDB on node02.

SST sync working again

Node02 started MariaDB and the log entries looked much better:

Sep 23 07:23:59 node02 systemd[1]: Starting MariaDB 10.2.26 database server...
Sep 23 07:24:01 node02 mysqld[4912]: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Sep 23 07:24:01 node02 mysqld[5182]: 2019-09-23  7:24:01 140351486481728 [Note] /usr/sbin/mysqld (mysqld 10.2.26-MariaDB-1:10.2.26+maria~bionic-log) starting as process 5182 ...
Sep 23 07:24:02 node02 -wsrep-sst-joiner: Streaming with xbstream
Sep 23 07:24:02 node02 -wsrep-sst-joiner: Using socat as streamer
Sep 23 07:24:02 node02 -wsrep-sst-joiner: Stale sst_in_progress file: /var/lib/mysql//sst_in_progress
Sep 23 07:24:02 node02 -wsrep-sst-joiner: Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} )
Sep 23 07:24:03 node02 -wsrep-sst-joiner: WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing
Sep 23 07:24:03 node02 -wsrep-sst-joiner: Proceeding with SST
Sep 23 07:24:03 node02 -wsrep-sst-joiner: Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} )
Sep 23 07:24:03 node02 -wsrep-sst-joiner: Cleaning the existing datadir and innodb-data/log directories
Sep 23 07:24:03 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/ib_logfile1'
Sep 23 07:24:03 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/aria_log_control'
Sep 23 07:24:03 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/aria_log.00000001'
Sep 23 07:24:03 node02 -wsrep-sst-joiner: removed '/var/lib/mysql/ibdata1'
Sep 23 07:24:03 node02 -wsrep-sst-joiner: Cleaning the binlog directory /var/log/mysql as well
Sep 23 07:24:03 node02 -wsrep-sst-joiner: removed '/var/log/mysql/mariadb-bin.000001'
Sep 23 07:24:03 node02 -wsrep-sst-joiner: Waiting for SST streaming to complete!

At the same time, the logs on node01 did not show any privilege errors anymore:

Sep 23 07:23:43 node01 -wsrep-sst-donor: Streaming with xbstream
Sep 23 07:23:43 node01 -wsrep-sst-donor: Using socat as streamer
Sep 23 07:23:43 node01 -wsrep-sst-donor: Using /tmp/tmp.pEa1nmYMp7 as innobackupex temporary directory
Sep 23 07:23:43 node01 -wsrep-sst-donor: Streaming GTID file before SST
Sep 23 07:23:43 node01 -wsrep-sst-donor: Evaluating mbstream -c ${INFO_FILE} | socat -u stdio TCP:192.168.253.82:4444; RC=( ${PIPESTATUS[@]} )
Sep 23 07:23:43 node01 -wsrep-sst-donor: Sleeping before data transfer for SST
Sep 23 07:23:53 node01 -wsrep-sst-donor: Streaming the backup to joiner at 192.168.253.82 4444
Sep 23 07:23:53 node01 -wsrep-sst-donor: Evaluating /usr//bin/mariabackup --innobackupex      $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2> >(logger -p daemon.err -t -innobackupex-backup) | socat -u stdio TCP:192.168.253.82:4444; RC=( ${PIPESTATUS[@]} )
Sep 23 07:23:53 node01 -innobackupex-backup: 190923 07:23:53 innobackupex: Starting the backup operation
Sep 23 07:23:53 node01 -innobackupex-backup:
Sep 23 07:23:53 node01 -innobackupex-backup: IMPORTANT: Please check that the backup run completes successfully.
Sep 23 07:23:53 node01 -innobackupex-backup:            At the end of a successful backup run innobackupex
Sep 23 07:23:53 node01 -innobackupex-backup:            prints "completed OK!".
Sep 23 07:23:53 node01 -innobackupex-backup:
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 Connecting to MySQL server host: localhost, user: sstuser, password: set, port: 3306, socket: /var/run/mysqld/mysqld.sock
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 Using server version 10.2.26-MariaDB-1:10.2.26+maria~bionic-log
Sep 23 07:23:53 node01 -innobackupex-backup: --innobackupex based on MariaDB server 10.2.26-MariaDB debian-linux-gnu (x86_64)
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 uses posix_fadvise().
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 cd to /var/lib/mysql/
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 open files limit requested 0, set to 1048576
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 mariabackup: using the following InnoDB configuration:
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 innodb_data_home_dir =
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 innodb_data_file_path = ibdata1:12M:autoextend
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 innodb_log_group_home_dir = ./
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 InnoDB: Using Linux native AIO
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 using O_DIRECT
Sep 23 07:23:53 node01 -innobackupex-backup: 2019-09-23  7:23:53 139846969603392 [Note] InnoDB: Number of pools: 1
Sep 23 07:23:53 node01 -innobackupex-backup: [00] 2019-09-23 07:23:53 mariabackup: Generating a list of tablespaces
[...]

Why was PROCESS missing?

Because mariabackup is (at the end) a fork of Percona's xtrabackup method, we simply kept the same privileges for the local database user required for the SST transfer using xtrabackup. But xtrabackup is quite old and meanwhile EOL (replaced by xtrabackup-v2 and forked by mariabackup). At the begin when xtrabackup was used as SST method, the PROCESS privilege was not needed back then, hence the missing PROCESS privilege.

The official documentation of mariabackup clearly states PROCESS as required privilege:

Mariabackup needs to authenticate with the database server when it performs a backup operation (i.e. when the --backup option is specified). For most use cases, the user account that performs the backup needs to have the RELOAD , PROCESS, LOCK TABLES and REPLICATION CLIENT global privileges on the database server. 



Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.