Galera Arbitrator garbd not starting: Exception in creating receive loop

Written by - 0 comments

Published on March 29th 2017 - last updated on December 22nd 2020 - Listed in MySQL DB Database MariaDB Galera


For a new with a HA database I decided to create a Galera cluster, as I already installed a couple of Galera clusters so far (see MySQL Galera cluster not starting (failed to open channel). But this time I decided to create a two node cluster with an Arbitrator service for split-brain situations.

The Galera Arbitrator service is a daemon process (garbd) which simply connects to the Galera cluster and is from then on part of the cluster. However there are no databases synced on the disk - it's a pure member, not a data node. That works great for this scenario because we have a dual data center anyway and I don't need three times the same data in two data centers.

I created a config filefor garbd, according to the Galera Arbitrator documentation:

root@garb:~# cat /etc/garbd.conf
# arbtirator.config
group = MYCLUSTER
address = gcomm://10.161.206.45,10.161.206.46

But when I tried to start garbd, it failed:

root@garb:~# garbd --cfg /etc/garbd.conf
2017-03-29 15:47:01.480  INFO: CRC-32C: using hardware acceleration.
2017-03-29 15:47:01.480  INFO: Read config:
    daemon:  0
    name:    garb
    address: gcomm://10.161.206.45,10.161.206.46
    group:   ATLDB
    sst:     trivial
    donor:  
    options: gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
    cfg:     /etc/garbd.conf
    log:  

I came across a github issue which stated that ports are required:

"garbd" consistently failed to start unless the configuration [...] explicitly provided the port number.

Important here is to note that we're talking about Galera ports, not MySQL/MariaDB ports (3306).
The default Galera port is 4567 and can be verified on one of the Galera data nodes:

root@galera-node1:~# netstat -lntup | grep mysql
tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      2971/mysqld    
tcp        0      0 0.0.0.0:4567            0.0.0.0:*               LISTEN      2971/mysqld 

Using the port 4567, I adapted /etc/garbd.conf:

root@garb:~#  cat /etc/garbd.conf
# arbtirator.config
group = ATLDB
address = gcomm://10.161.206.45:4567,10.161.206.46:4567

Start test:

root@garb:~# garbd --cfg /etc/garbd.conf
2017-03-29 15:48:31.289  INFO: CRC-32C: using hardware acceleration.
2017-03-29 15:48:31.289  INFO: Read config:
    daemon:  0
    name:    garb
    address: gcomm://10.161.206.45:4567,10.161.206.46:4567
    group:   ATLDB
    sst:     trivial
    donor:  
    options: gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
    cfg:     /etc/garbd.conf
    log:    

2017-03-29 15:48:31.290  INFO: protonet asio version 0
2017-03-29 15:48:31.290  INFO: Using CRC-32C for message checksums.
2017-03-29 15:48:31.290  INFO: backend: asio
2017-03-29 15:48:31.290  INFO: gcomm thread scheduling priority set to other:0
2017-03-29 15:48:31.290  WARN: access file(./gvwstate.dat) failed(No such file or directory)
2017-03-29 15:48:31.290  INFO: restore pc from disk failed
2017-03-29 15:48:31.291  INFO: GMCast version 0
2017-03-29 15:48:31.291  INFO: (6520b85a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-03-29 15:48:31.291  INFO: (6520b85a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-03-29 15:48:31.291  INFO: EVS version 0
2017-03-29 15:48:31.291  INFO: gcomm: connecting to group 'ATLDB', peer '10.161.206.45:4567,10.161.206.46:4567'
2017-03-29 15:48:31.293  INFO: (6520b85a, 'tcp://0.0.0.0:4567') connection established to 6a1ea4ef tcp://10.161.206.45:4567
2017-03-29 15:48:31.293  INFO: (6520b85a, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2017-03-29 15:48:31.296  INFO: (6520b85a, 'tcp://0.0.0.0:4567') connection established to 5d311f46 tcp://10.161.206.46:4567
2017-03-29 15:48:31.585  INFO: declaring 5d311f46 at tcp://10.161.206.46:4567 stable
2017-03-29 15:48:31.585  INFO: declaring 6a1ea4ef at tcp://10.161.206.45:4567 stable
2017-03-29 15:48:31.586  INFO: Node 5d311f46 state prim
2017-03-29 15:48:31.587  INFO: view(view_id(PRIM,5d311f46,5) memb {
    5d311f46,0
    6520b85a,0
    6a1ea4ef,0
} joined {
} left {
} partitioned {
})
2017-03-29 15:48:31.587  INFO: save pc into disk
2017-03-29 15:48:31.792  INFO: gcomm: connected
2017-03-29 15:48:31.792  INFO: Changing maximum packet size to 64500, resulting msg size: 32636
2017-03-29 15:48:31.792  INFO: Shifting CLOSED -> OPEN (TO: 0)
2017-03-29 15:48:31.792  INFO: Opened channel 'ATLDB'
2017-03-29 15:48:31.792  INFO: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2017-03-29 15:48:31.792  INFO: STATE EXCHANGE: Waiting for state UUID.
2017-03-29 15:48:31.792  INFO: STATE EXCHANGE: sent state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23
2017-03-29 15:48:31.792  INFO: STATE EXCHANGE: got state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23 from 0 (inf-atldb02-p)
2017-03-29 15:48:31.792  INFO: STATE EXCHANGE: got state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23 from 2 (inf-atldb01-p)
2017-03-29 15:48:31.793  INFO: STATE EXCHANGE: got state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23 from 1 (garb)
2017-03-29 15:48:31.793  INFO: Quorum results:
    version    = 4,
    component  = PRIMARY,
    conf_id    = 4,
    members    = 2/3 (joined/total),
    act_id     = 0,
    last_appl. = -1,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 6a1f102a-13a3-11e7-b710-b2876418a643
2017-03-29 15:48:31.793  INFO: Flow-control interval: [9999999, 9999999]
2017-03-29 15:48:31.793  INFO: Shifting OPEN -> PRIMARY (TO: 0)
2017-03-29 15:48:31.793  INFO: Sending state transfer request: 'trivial', size: 7
2017-03-29 15:48:31.795  INFO: Member 1.0 (garb) requested state transfer from '*any*'. Selected 0.0 (inf-atldb02-p)(SYNCED) as donor.
2017-03-29 15:48:31.795  INFO: Shifting PRIMARY -> JOINER (TO: 0)
2017-03-29 15:48:31.796  INFO: 0.0 (inf-atldb02-p): State transfer to 1.0 (garb) complete.
2017-03-29 15:48:31.796  INFO: 1.0 (garb): State transfer from 0.0 (inf-atldb02-p) complete.
2017-03-29 15:48:31.796  INFO: Shifting JOINER -> JOINED (TO: 0)
2017-03-29 15:48:31.797  INFO: Member 0.0 (inf-atldb02-p) synced with group.
2017-03-29 15:48:31.797  INFO: Member 1.0 (garb) synced with group.
2017-03-29 15:48:31.797  INFO: Shifting JOINED -> SYNCED (TO: 0)

It does indeed look better now! A verification on data node 1 confirmed that the cluster size increased from 2 to 3:

root@galera-node1:~#  mysql -B -e "SHOW STATUS WHERE variable_name ='wsrep_local_state_comment' \
OR variable_name ='wsrep_cluster_size' \
OR variable_name ='wsrep_incoming_addresses' \
OR variable_name ='wsrep_cluster_status' \
OR variable_name ='wsrep_connected' \
OR variable_name ='wsrep_ready' \
OR variable_name ='wsrep_local_state_uuid' \
OR variable_name ='wsrep_cluster_state_uuid';"

Variable_name    Value
wsrep_cluster_size    3
wsrep_cluster_state_uuid    6a1f102a-13a3-11e7-b710-b2876418a643
wsrep_cluster_status    Primary
wsrep_connected    ON
wsrep_incoming_addresses    ,10.161.206.46:3306,10.161.206.45:3306
wsrep_local_state_comment    Synced
wsrep_local_state_uuid    6a1f102a-13a3-11e7-b710-b2876418a643
wsrep_ready    ON

Note that the garbd machine doesn't show up in the row "wsrep_incoming_addresses". It's merely showing up "empty" (note the comma). That makes sense, because there is no MySQL running on the Arbitrator Service machine, ergo no 3306 listener.

Need help in Galera troubleshooting?

Problems in Galera Clusters are not always easy to spot. Need help troubleshooting a Galera cluster? Contact us on Infiniroot.com.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.