Unique sorting of multiple words in string in PHP

Written by - 0 comments

Published on March 3rd 2019 - Listed in PHP


I am currently working on a new responsive design for my blog (yes, after 11 years it's about time!) and while doing this I am also re-programming certain parts.

On the new version I will add a dynamically generated list of article categories/tags, sometimes also known as word cloud. But this is easier said than done, because I struggled to get a unique list of tags in PHP.

Let's first take a look how the tags are currently set up. Every blog article uses a column "tags" where one or more strings can be defined to tag the article. For example:

# mysql -e "select title, tags from website.blog limit 0,5;"
+----------------------------------+---------------------------------+
| title                            | tags                            |
+----------------------------------+---------------------------------+
| Welcome to claudiokuenzler.com ! | Personal Internet               |
| UEFA Euro, I'm coming!!          | Personal                        |
| New How To's                     | VMware PHP Linux Shell Internet |
| One day left and the game starts | Personal                        |
| After the lost game...           | Personal                        |
+----------------------------------+---------------------------------+

Now from all these tags, I wanted to create a unique list of tags. Each tag should only appear once. In Bash I would have had the result in a couple of seconds, depending how fast I'm hammering on the keyboard:

# mysql --skip-column-names -Be "select tags from website.blog limit 0,5;" | sed "s/ /\n/g" | sort -u

Internet
Linux
Personal
PHP
Shell
VMware

Note: This just made me realize I had a trailing white space in the "New How To's" tags ;-)

Yeah, pretty fast. But I wanted to solve this in PHP. Which took me much longer than I expected.

I prepared the PHP code to go through each blog article and get the list of all tags.
I wanted to collect the list in different ways; as a single value ($tags) and as arrays ($taglist, $taglist2, $taglist3):

$taglist = array();
$taglist2 = array();
$taglist3 = array();
$tags = '';

For each result (tags of each article), the single value $tags will be appended and the arrays $taglist and $taglist2 will receive a new numeric index with new values. $taglist3 is a little different, as I tried a different approach here using a single array index (0):

$i = 0;
$anfrage = mysqli_query($dbh, "SELECT DISTINCT tags FROM blog");
while ($zeile = mysqli_fetch_assoc($anfrage)) {
        $tags .= " $zeile[tags]"; // Creates a (huge) string value
        $taglist[$i] = $zeile[tags]; // Adds values to an array, different indexes
        array_push($taglist2, $zeile[tags]); // Another way to add values to an array, different indexes
        $taglist3[0] .= " $zeile[tags]"; // Adds values to array but all into the same array index (0)
        $i++;
}

 After this I added some outputs to see what PHP did in each case:

echo "First up: Single value tags
";
echo $tags;
echo "
";
echo "Next up: taglist
";
print_r($taglist);
echo "
";
echo "Next up: taglist2
";
print_r($taglist2);
echo "
";
echo "Finally: taglist3
";
print_r($taglist3);

Let's take a look at the different outputs:

First up: Single value tags
Personal Internet Personal VMware PHP Linux Shell Internet Linux PHP Personal Personal Bluecoat Proxy Windows Internet Personal Internet Hardware VMware Virtualization Windows PHP Linux Shell Nagios Shell MySQL DB Monitoring Linux MySQL DB Windows Hardware Internet Mail Internet Personal Android Hardware PHP DB MySQL PHP Internet Windows Network Personal Internet Windows Personal Hardware Internet Linux Mail Internet Linux Mail Nagios Hardware Monitoring Windows Proxy Hardware [...]

Next up: taglist
Array ( [0] => Personal Internet [1] => Personal [2] => VMware PHP Linux Shell Internet [3] => Linux PHP Personal [4] => Personal Bluecoat Proxy [5] => Windows Internet [6] => Personal Internet Hardware [7] => VMware Virtualization [8] => Windows [9] => PHP Linux Shell [10] => Nagios Shell MySQL DB Monitoring [11] => Linux MySQL DB [12] => Windows Hardware [13] => Internet Mail [14] => Internet Personal [15] => Android [16] => Hardware [17] => PHP DB MySQL [18] => PHP [19] => Internet Windows Network [20] => Personal Internet Windows [21] => Personal Hardware [22] => Internet Linux Mail [23] => Internet [24] => Linux Mail [25] => Nagios Hardware Monitoring [...]

Next up: taglist2
Array ( [0] => Personal Internet [1] => Personal [2] => VMware PHP Linux Shell Internet [3] => Linux PHP Personal [4] => Personal Bluecoat Proxy [5] => Windows Internet [6] => Personal Internet Hardware [7] => VMware Virtualization [8] => Windows [9] => PHP Linux Shell [10] => Nagios Shell MySQL DB Monitoring [11] => Linux MySQL DB [12] => Windows Hardware [13] => Internet Mail [14] => Internet Personal [15] => Android [16] => Hardware [17] => PHP DB MySQL [18] => PHP [19] => Internet Windows Network [20] => Personal Internet Windows [21] => Personal Hardware [22] => Internet Linux Mail [23] => Internet [24] => Linux Mail [25] => Nagios Hardware Monitoring [...]

Finally: taglist3
Array ( [0] => Personal Internet Personal VMware PHP Linux Shell Internet Linux PHP Personal Personal Bluecoat Proxy Windows Internet Personal Internet Hardware VMware Virtualization Windows PHP Linux Shell Nagios Shell MySQL DB Monitoring Linux MySQL DB Windows Hardware Internet Mail Internet Personal Android Hardware PHP DB MySQL PHP Internet Windows Network Personal Internet Windows Personal Hardware Internet Linux Mail Internet Linux Mail Nagios Hardware Monitoring Windows Proxy Hardware [...]

The single value $tags is one huge string containing a lot of words. It is also very similar to $taglist3 with the difference that $taglist3 is an array, but basically holds the same value (a single huge string).
$taglist and $taglist2 are exactly the same. That's no surprise as I simply tested two ways of adding values to an array.

Now to the interesting part: I needed to output unique words/strings from all these tags. And this is where the difficulty started.
I was unable to find a PHP function to sort and output unique words of a string (everything pointed towards needing an array). Speaking of arrays, here I found the function array_unique which looked promising so I tried it with the defined arrays (here $taglist2):

$unique2 = array_unique($taglist2);
echo "
Achieve the goal with array
";
echo $unique2;

The output was not at all what I expected:

Achieve the goal with array
Personal Internet Personal VMware PHP Linux Shell Internet Linux PHP Personal Personal Bluecoat Proxy Windows Internet Personal Internet Hardware VMware Virtualization Windows PHP Linux Shell Nagios Shell MySQL DB Monitoring Linux MySQL DB Windows Hardware Internet Mail Internet Personal Android Hardware PHP DB MySQL PHP Internet Windows Network Personal Internet Windows Personal Hardware Internet Linux Mail Internet Linux Mail Nagios Hardware Monitoring Windows Proxy Hardware Internet PHP Linux VMware Linux Virtualization Linux Internet Linux Shell Linux Nagios Linux Internet Monitoring Android Linux Mail Windows Proxy Network Hardware VMware Virtualization Hardware VMware Nagios VMware Virtualization Monitoring Hardware Wyse Virtualization Internet Android Network Hardware Internet Personal Android Nagios Monitoring Windows Mail Nagios VMware Hardware Virtualization Monitoring Windows Personal Internet Linux PHP DB MySQL VMware Hardware Linux Virtualization Hardware Personal Network Windows Wyse Virtualization Hardware Network Nagios Linux Monitoring Nagios Windows Monitoring Internet Linux PHP Hacks Nagios Hardware Virtualization Monitoring Hardware Virtualization Hardware Personal Windows VMware Virtualization Hardware Linux Nagios Hardware Network Monitoring Hardware Virtualization Wyse Nagios VMware Internet Hardware Monitoring Linux Internet DB MySQL Nagios Internet Hardware Monitoring Wyse Windows Virtualization VMware Nagios Hardware Virtualization Monitoring [...]

What the hell... I basically got one huge list again!

Why? Because of the ways the tags are stored in the database. Each array value is considered to be a single string, no matter how many tags appear in that string. So "Personal Internet" is a string, "Personal" is another one. To array_unique these are two strings and therefore different. This means I had to find another way and on my research I came across a forum post, which pointed me to the solution. It may sound very strange at first but it actually makes sense: To sort each single tag, PHP must first recognize each tag as a single string. And this can only be done by first having a huge string (yes, I do, in $tags!) which needs to be converted into an array with single values. This array can then be used with the function array_unique! Let's do this!

$unique1 = explode(" ",$tags); 

This creates a new array $unique1 from the huge single value $tags created before. Here a whitespace (" ") is used to separate the values. Meaning: The single huge string is split into many small strings. One string for each tag.

$unique1 = array_unique($unique1);

Now the function array_unique can be run on that array. Let's see the output:

echo "
Achieve the goal with single value as a base
";
echo $unique1;

Achieve the goal with single value as a base
Personal Internet VMware PHP Linux Shell Bluecoat Proxy Windows Hardware Virtualization Nagios MySQL DB Monitoring Mail Android Network Wyse Hacks Tomcat Postgres Apple Mac Backup BSD ZFS Solaris SmartOS Unix Multimedia Perl Database MongoDB CMS OTRS Harddware FreeBSD Wordpress LXC Nginx Proxmox DNS Graphics GlusterFS Security Chef HAProxy Icinga nginx Ansible HTML MariaDB Docker AWS ELK Kibana Logstash Filebeat Rancher Varnish PGSQL PostgreSQL ElasticSearch CouchDB Bash Macintosh Container Minio Grafana InfluxDB Databases NFS OSSEC Containers SystemD Java Zoneminder Surveillance Elasticsearch SSL TLS Icingaweb2 Cloud Wireless Kubernetes Ubuntu

Heureka! This worked!


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.