MariaDB Galera Cluster: When the data is in sync but the query results differ

Written by - 0 comments

Published on - Listed in MySQL MariaDB Database Galera


On Friday's there's always a bit more time for the curious cases which require some deeper research.

A developer contacted me with the suspicion that the data inside a clustered MariaDB (using Galera) was not in sync. Sometimes a certain query resulted X results, sometimes Y results.

Technically this sounded plausible because the application connects to 127.0.0.1:3306 which is a HAProxy listening on 3306. The requests are then balanced across multiple Galera nodes in round-robin style. So when the query lands on node1, X results are shown. But when the query lands on node2, Y results are shown. That could indeed be a data sync problem.

But it wasn't. The data across the cluster was exactly the same on all nodes. The same amount of rows inside the tables, the same data.

On node1:

MariaDB [app]> select count(*) from docs;
+----------+
| count(*) |
+----------+
|     3916 |
+----------+
1 row in set (0.00 sec)

MariaDB [app]> select count(*) from pages;
+----------+
| count(*) |
+----------+
|      446 |
+----------+
1 row in set (0.01 sec)

On node2:

MariaDB [app]> select count(*) from docs;
+----------+
| count(*) |
+----------+
|     3916 |
+----------+
1 row in set (0.00 sec)

MariaDB [app]> select count(*) from pages;
+----------+
| count(*) |
+----------+
|      446 |
+----------+
1 row in set (0.01 sec)

Even a dump created on both nodes showed the exact same size:

root@node1:/tmp# mysqldump app > /tmp/app.sql
root@node1:/tmp# du -ks /tmp/app.sql
3416    /tmp/app.sql

root@node2:/tmp# mysqldump app > /tmp/app.sql
root@node2:/tmp# du -ks /tmp/app.sql
3416    /tmp/app.sql

Could the two nodes interpret the query differently? The query itself is indeed a bit special, joining two tables together and using several internal MySQL functions:

SELECT DISTINCT d.*, p.page, p.pagestatus, DATE_FORMAT(creationdate,"%d.%m.%Y %H:%i:%s") as dcreationdate, DATE_FORMAT(updatedate,"%d.%m.%Y %H:%i:%s") as dupdatedate, DATE_FORMAT(lastpubdate,"%d.%m.%Y %H:%i:%s") as dlastpubdate, DATE_FORMAT(modified,"%d.%m.%Y %H:%i:%s") as dmodified FROM docs d LEFT JOIN (SELECT * FROM pages WHERE id IN (SELECT MAX(id) FROM pages GROUP BY wwid)) p ON (p.wwid = d.wwid) WHERE d.id in (SELECT MAX(id) FROM docs WHERE channelid = 2 GROUP BY ldid) AND d.printpublicationdate = '2018-06-22' ORDER BY updatedate DESC;

Agreed, not your typical select query... And this query did indeed show two different result sets from node1 and node2:

node1: 62 rows in set (0.02 sec)
node2: 82 rows in set (0.01 sec)

I checked for differences on the two nodes and indeed found a different patch level of MariaDB: node1 had 10.0.33 installed, node2 was running a newer version 10.0.36. Could this really be the source of the difference?

As this problem was identified on a staging server, I updated all cluster nodes to the current MariaDB patch level 10.0.37 (I wouldn't have done that on a Friday afternoon on PROD, no way!).
And by "magic", both nodes reported the same results once they ran the same version:

node1: 82 rows in set (0.02 sec)
node2: 82 rows in set (0.01 sec)

There were two important lessons learned here:

1) The data was still in sync and therefore not corrupted.

2) Always make sure a Galera cluster uses the same version across all nodes.

I'm not sure which exact version (10.0.34, 35 or 36) fixed this, but it could be related to bug MDEV-15035 which was fixed in 10.0.35.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Icingaweb   Icingaweb2   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder   


Update cookies preferences