Galera node unable to sync from donor: Requesting state transfer failed: -113(No route to host)

Written by - 2 comments

Published on - last updated on December 22nd 2020 - Listed in MySQL Database Galera


On a new Galera setup with MariaDB 10.4, the joining node (node 2) was unable to sync an initial state transfer (IST) from the primary node (node 1).

The error messages logged:

2020-04-20 18:32:41 0 [Note] WSREP: Loading provider /usr/lib/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
wsrep loader: [INFO] wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
wsrep loader: [INFO] wsrep_load(): Galera 26.4.3(r4535) by Codership Oy <info@codership.com> loaded successfully.
2020-04-20 18:32:41 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2020-04-20 18:32:41 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2020-04-20 18:32:41 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: b1ad1d21-831f-11ea-b0ff-b63ef682e2e4
Seqno: -1 - -1
Offset: -1
Synced: 0
2020-04-20 18:32:41 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: b1ad1d21-831f-11ea-b0ff-b63ef682e2e4, offset: -1
2020-04-20 18:32:41 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (         0/2147483672 bytes) complete.
2020-04-20 18:32:42 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (2147483672/2147483672 bytes) complete.
2020-04-20 18:32:42 0 [Note] WSREP: Recovering GCache ring buffer: didn't recover any events.
2020-04-20 18:32:42 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.5.102; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 300M; gcache.recover = yes; gcache.size = 2G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = f
2020-04-20 18:32:42 0 [Note] WSREP: Start replication
2020-04-20 18:32:42 0 [Note] WSREP: Connecting with bootstrap option: 0
2020-04-20 18:32:42 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2020-04-20 18:32:42 0 [Note] WSREP: protonet asio version 0
2020-04-20 18:32:42 0 [Note] WSREP: Using CRC-32C for message checksums.
2020-04-20 18:32:42 0 [Note] WSREP: backend: asio
2020-04-20 18:32:42 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2020-04-20 18:32:42 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2020-04-20 18:32:42 0 [Note] WSREP: restore pc from disk failed
2020-04-20 18:32:42 0 [Note] WSREP: GMCast version 0
2020-04-20 18:32:42 0 [Note] WSREP: (8ee671b5, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2020-04-20 18:32:42 0 [Note] WSREP: (8ee671b5, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2020-04-20 18:32:42 0 [Note] WSREP: EVS version 1
2020-04-20 18:32:42 0 [Note] WSREP: gcomm: connecting to group 'GALERA-CLUSTER', peer '192.168.5.101:,192.168.5.102:,192.168.5.103:'
2020-04-20 18:32:42 0 [Note] WSREP: (8ee671b5, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://192.168.5.102:4567
2020-04-20 18:32:42 0 [Note] WSREP: (8ee671b5, 'tcp://0.0.0.0:4567') connection established to 926fdb44 tcp://192.168.5.101:4567
2020-04-20 18:32:42 0 [Note] WSREP: (8ee671b5, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2020-04-20 18:32:43 0 [Note] WSREP: EVS version upgrade 0 -> 1
2020-04-20 18:32:43 0 [Note] WSREP: declaring 926fdb44 at tcp://192.168.5.101:4567 stable
2020-04-20 18:32:43 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2020-04-20 18:32:43 0 [Note] WSREP: Node 926fdb44 state prim
2020-04-20 18:32:43 0 [Note] WSREP: view(view_id(PRIM,8ee671b5,68) memb {
    8ee671b5,0
    926fdb44,0
} joined {
} left {
} partitioned {
})
2020-04-20 18:32:43 0 [Note] WSREP: save pc into disk
2020-04-20 18:32:43 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.5.103:4567
2020-04-20 18:32:43 0 [Note] WSREP: gcomm: connected
2020-04-20 18:32:43 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2020-04-20 18:32:43 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2020-04-20 18:32:43 0 [Note] WSREP: Opened channel 'GALERA-CLUSTER'
2020-04-20 18:32:43 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2020-04-20 18:32:43 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 8f7f5ce7-8324-11ea-a51f-c2eb444a2d00
2020-04-20 18:32:43 1 [Note] WSREP: Starting rollbacker thread 1
2020-04-20 18:32:43 2 [Note] WSREP: Starting applier thread 2
2020-04-20 18:32:43 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 8f7f5ce7-8324-11ea-a51f-c2eb444a2d00
2020-04-20 18:32:43 0 [Note] WSREP: STATE EXCHANGE: got state msg: 8f7f5ce7-8324-11ea-a51f-c2eb444a2d00 from 0 (db02)
2020-04-20 18:32:43 0 [Note] WSREP: STATE EXCHANGE: got state msg: 8f7f5ce7-8324-11ea-a51f-c2eb444a2d00 from 1 (db01)
2020-04-20 18:32:43 0 [Note] WSREP: Quorum results:
    version    = 6,
    component  = PRIMARY,
    conf_id    = 67,
    members    = 1/2 (joined/total),
    act_id     = 69,
    last_appl. = 2,
    protocols  = 2/10/4 (gcs/repl/appl),
    vote policy= 0,
    group UUID = b1ad1d21-831f-11ea-b0ff-b63ef682e2e4
2020-04-20 18:32:43 0 [Note] WSREP: Flow-control interval: [23, 23]
2020-04-20 18:32:43 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 70)
2020-04-20 18:32:43 2 [Note] WSREP: ####### processing CC 70, local, ordered
2020-04-20 18:32:43 2 [Note] WSREP: REPL Protocols: 10 (5, 3)
2020-04-20 18:32:43 2 [Note] WSREP: ####### My UUID: 8ee671b5-8324-11ea-9b4f-a7f1adddc84b
2020-04-20 18:32:43 2 [Note] WSREP: Server db02 connected to cluster at position b1ad1d21-831f-11ea-b0ff-b63ef682e2e4:70 with ID 8ee671b5-8324-11ea-9b4f-a7f1adddc84b
2020-04-20 18:32:43 2 [Note] WSREP: Server status change disconnected -> connected
2020-04-20 18:32:43 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-04-20 18:32:43 2 [Note] WSREP: State transfer required:
    Group state: b1ad1d21-831f-11ea-b0ff-b63ef682e2e4:70
    Local state: 00000000-0000-0000-0000-000000000000:-1
2020-04-20 18:32:43 2 [Note] WSREP: Server status change connected -> joiner
2020-04-20 18:32:43 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-04-20 18:32:43 0 [Note] WSREP: Joiner monitor thread started to monitor
2020-04-20 18:32:43 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.5.102' --datadir '/var/lib/mysql/' --parent '22080' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1'
WSREP_SST: [INFO] Streaming with xbstream (20200420 18:32:44.016)
WSREP_SST: [INFO] Using socat as streamer (20200420 18:32:44.019)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20200420 18:32:44.024)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20200420 18:32:44.068)
2020-04-20 18:32:44 2 [Note] WSREP: Prepared SST request: mariabackup|192.168.5.102:4444/xtrabackup_sst//1
2020-04-20 18:32:44 2 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2020-04-20 18:32:44 0 [Note] WSREP: Service thread queue flushed.
2020-04-20 18:32:44 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
2020-04-20 18:32:44 2 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 70, STRv: 3
2020-04-20 18:32:44 2 [Note] WSREP: IST receiver addr using tcp://192.168.5.102:4568
2020-04-20 18:32:44 2 [Note] WSREP: Prepared IST receiver for 0-70, listening at: tcp://192.168.5.102:4568
2020-04-20 18:32:44 0 [Warning] WSREP: Member 0.0 (db02) requested state transfer from '192.168.5.101', but it is impossible to select State Transfer donor: No route to host
2020-04-20 18:32:44 2 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2020-04-20 18:32:44 2 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2020-04-20 18:32:44 2 [Note] WSREP: ReplicatorSMM::abort()
2020-04-20 18:32:44 2 [Note] WSREP: Closing send monitor...
2020-04-20 18:32:44 2 [Note] WSREP: Closed send monitor.
2020-04-20 18:32:44 2 [Note] WSREP: gcomm: terminating thread
2020-04-20 18:32:44 2 [Note] WSREP: gcomm: joining thread
2020-04-20 18:32:44 2 [Note] WSREP: gcomm: closing backend
2020-04-20 18:32:45 2 [Note] WSREP: view(view_id(NON_PRIM,8ee671b5,68) memb {
    8ee671b5,0
} joined {
} left {
} partitioned {
    926fdb44,0
})
2020-04-20 18:32:45 2 [Note] WSREP: PC protocol downgrade 1 -> 0
2020-04-20 18:32:45 2 [Note] WSREP: view((empty))
2020-04-20 18:32:45 2 [Note] WSREP: gcomm: closed
2020-04-20 18:32:45 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-04-20 18:32:45 0 [Note] WSREP: Flow-control interval: [16, 16]
2020-04-20 18:32:45 0 [Note] WSREP: Received NON-PRIMARY.
2020-04-20 18:32:45 0 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 70)
2020-04-20 18:32:45 0 [Note] WSREP: New SELF-LEAVE.
2020-04-20 18:32:45 0 [Note] WSREP: Flow-control interval: [0, 0]
2020-04-20 18:32:45 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2020-04-20 18:32:45 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: -1)
2020-04-20 18:32:45 0 [Note] WSREP: RECV thread exiting 0: Success
2020-04-20 18:32:45 2 [Note] WSREP: recv_thread() joined.
2020-04-20 18:32:45 2 [Note] WSREP: Closing replication queue.
2020-04-20 18:32:45 2 [Note] WSREP: Closing slave action queue.
2020-04-20 18:32:45 2 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Terminated

The following lines in particular looked intriguing:

2020-04-20 18:32:44 0 [Warning] WSREP: Member 0.0 (db02) requested state transfer from '192.168.5.101', but it is impossible to select State Transfer donor: No route to host
2020-04-20 18:32:44 2 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2020-04-20 18:32:44 2 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.

Why would this Galera node db02 be unable to connect to the donor when they're both in the same network?

A quick verification of the Galera settings on node 2 revealed that wsrep_sst_donor was set to the IP address of the primary node db01:

# Galera Sync Method
wsrep_cluster_address="gcomm://192.168.5.101,192.168.5.102,192.168.5.103"
wsrep_sst_method=mariabackup
wsrep_sst_auth="sst:secret"
wsrep_slave_threads=16
wsrep_sst_donor=192.168.5.101
wsrep_node_address="192.168.5.102"
wsrep_node_name="db02"

But this actually causes problems, because wsrep_sst_donor should be a node name, not the IP address of a node.

Adjusting wsrep_sst_donor to the node name of db01 did the trick:

wsrep_sst_donor="db01,"

Note: the ',' after the donor node name tells Galera to look for other donors in the cluster if db01 is not available.

db02 was finally able to sync itself with the donor db01:

2020-04-20 18:36:33 2 [Note] WSREP: Prepared SST request: mariabackup|192.168.5.102:4444/xtrabackup_sst//1
2020-04-20 18:36:33 2 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2020-04-20 18:36:33 0 [Note] WSREP: Service thread queue flushed.
2020-04-20 18:36:33 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
2020-04-20 18:36:33 2 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 92, STRv: 3
2020-04-20 18:36:33 2 [Note] WSREP: IST receiver addr using tcp://192.168.5.102:4568
2020-04-20 18:36:33 2 [Note] WSREP: Prepared IST receiver for 0-92, listening at: tcp://192.168.5.102:4568
2020-04-20 18:36:33 0 [Note] WSREP: Member 0.0 (db02) requested state transfer from 'db01'. Selected 1.0 (db01)(SYNCED) as donor.
2020-04-20 18:36:33 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 92)
2020-04-20 18:36:33 2 [Note] WSREP: Requesting state transfer: success, donor: 1
2020-04-20 18:36:33 2 [Note] WSREP: Resetting GCache seqno map due to different histories.

Need help in Galera troubleshooting?

Problems in Galera Clusters are not always easy to spot. Need help troubleshooting a Galera cluster? Contact us on Infiniroot.com.


Add a comment

Show form to leave a comment

Comments (newest first)

Martijn from wrote on Mar 5th, 2024:

Thank for this, I was pulling my hair out here, I was trying to use IP as well :)


Jeff from wrote on Jan 30th, 2024:

What an absolute piece of crap, but thank you, i always use DNS, i tried to use IP over a VPC Peering connection to ensure no extra hops...and that completely F'd me. Thanks for pointing me back to what i wanted to do in the first place, i'll never forget this.


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder