Visitors of this blog surely have noticed the so-called "Blog Tags" on the right side of this website. If you're reading on a mobile device, these tags are shown below a blog post.
These tags are there for categorizing a blog post. It helps visitors to quickly find other blog posts related to a specific topic.
In the background, my self-coded CMS saves the tags in a column "tags" in the MariaDB database:
MariaDB > SELECT title,tags FROM blog WHERE id = 1488;
+-----------------------------------------------+---------------+
| title | tags |
+-----------------------------------------------+---------------+
| Keepalived VIP not responding on RHEL 9 / EL9 | Linux Network |
+-----------------------------------------------+---------------+
1 row in set (0.000 sec)
As you can see, multiple tags are space separated entries in the same (varchar) column.
Until recently I used the LIKE string function to search for blog posts matching a specific tag:
mysqli_query($dbh, "SELECT * FROM blog WHERE tags LIKE '%$tag%' ORDER BY timeanddate DESC");
This query worked fine - in most situations.
The used LIKE function in the SQL query can be understood like a grep command (without addtional parameters). For easily identifiable tags, such as Linux or Monitoring this works fine.
A problem however are tags which "contain" the same word. For example: The string "DB" shows up in "MariaDB" and in "MongoDB". The current SQL query using LIKE finds all these blog articles, similar to what a grep would do on the command line:
MariaDB > SELECT title,tags FROM blog WHERE tags LIKE '%DB%' ORDER BY timeanddate DESC LIMIT 0,5;
+----------------------------------------------------------------------------------------+-------------------------------------------+
| title | tags |
+----------------------------------------------------------------------------------------+-------------------------------------------+
| How to change Icingaweb2 password for icingaadmin user in backend database | Icinga Monitoring Databases MySQL MariaDB |
| Travis CI builds fail with expired key error on MongoDB APT repository | Cloud Coding Linux MongoDB |
| How to reset MariaDB root login to use unix socket authentication | MariaDB Linux Databases |
| How to list and backup only specific MySQL or MariaDB databases using Ansible Playbook | Ansible Linux MySQL MariaDB |
| Analyzing and fixing Internal Server Error in PowerDNS API and Opera DNS UI | DNS PowerDNS MySQL MariaDB Linux |
+----------------------------------------------------------------------------------------+-------------------------------------------+
5 rows in set (0.007 sec)
I could argue and say that it's not relevant for the "DB" string. But for the "Proxy" string it is. I wanted to clearly distinguish topics between "Proxy" and "HAProxy" - yet both posts showed up under the "Proxy" tag:
It works - but it's not 100% accurate.
While I was debugging a Wordpress issue for a customer, I came across some SQL queries which used the REGEXP string function. I personally wasn't aware that MySQL/MariaDB could do this so I had to look it up. Using grep and sed and other commands on the command line I'm familiar with regular expressions. That could actually be a great improvement in my "blog tag" issue!
My first attempt failed, as I wanted to use a query with whitespaces (using \s) as separator between the different words in the string. But after more research I found out that REGEXP supports special characters to indicate "word start" and "word end" - or better said "word boundaries":
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
Perfect, that's even better than trying to use whitespaces in the regular expression.
I adjusted the SQL query to use REGEXP (with the word boundaries) instead of LIKE:
MariaDB > SELECT title,tags FROM blog WHERE tags REGEXP '[[:<:]]Proxy[[:>:]]' ORDER BY timeanddate DESC LIMIT 0,5;
+---------------------------------------------------------------+-----------------------------------------+
| title | tags |
+---------------------------------------------------------------+-----------------------------------------+
| How to verify and monitor Tinyproxy as a HTTP forward proxy | Monitoring Proxy Linux Internet Network |
| Follow-up on using NTLM on Windows 7 for Proxy Authentication | Windows Proxy Network |
| Using NTLM on Windows 7 | Windows Proxy |
| Hot times, summer in the city | Personal Bluecoat Proxy |
+---------------------------------------------------------------+-----------------------------------------+
4 rows in set (0.008 sec)
Only the Proxy tags now show up in the results (no more HAProxy posts)!
After adjusting the query in the PHP code to use REGEXP instead of LIKE:
mysqli_query($dbh, "SELECT * FROM blog WHERE tags REGEXP '[[:<:]]${tag}[[:>:]]' ORDER BY timeanddate DESC");
And now, finally, my blog shows only the posts with the distinguished blog tag:
I had been thinking of improving this for quite a while. Luckily I came across these REGEXP queries during one of my troubleshooting sessions. :-)
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PHP Perl Personal PostgreSQL PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder Linux