Postfix process (and listener) not running after service restart: The mystery of the master.lock file

Written by - 0 comments

Published on - Listed in Mail Linux


After rolling out global mail queue monitoring, one Ubuntu 20.04 server returned a cryptic error message, instead of the size of the mail queue:

postqueue: fatal: Queue report unavailable - mail system is down
CRITICAL: Error code 69 returned from /usr/bin/mailq

After logging into that server it became clear that Postfix is not running and no SMTP listener was up:

root@ubuntu:~# netstat -lntup|grep 25
# no output

Restart Postfix service. Easy, right? That's what I thought, too.

Postfix mail system is already running

But a restart did not give me the results I was looking for, as a quick look at the mail error log showed:

root@ubuntu:~# systemctl restart postfix
root@ubuntu:~# tail /var/log/mail.err
Dec  6 18:35:02 ubuntu postfix/postqueue[738031]: fatal: Queue report unavailable - mail system is down
Dec  6 18:36:00 ubuntu postfix/postqueue[738087]: fatal: Queue report unavailable - mail system is down
Dec  6 18:36:53 ubuntu postfix/postqueue[738218]: fatal: Queue report unavailable - mail system is down
Dec  6 18:36:58 ubuntu postfix/postqueue[738225]: fatal: Queue report unavailable - mail system is down
Dec  6 18:37:28 ubuntu postfix/postfix-script[738316]: fatal: the Postfix mail system is already running

At short moment after the Postfix service was restarted, the postfix-script logged that Postfix is already running. But no process nor a listener was running. There was a PID file from Postfix's master process though:

root@ubuntu:~# ll /var/spool/postfix/pid/master.pid
-rw------- 2 postfix postfix 33 Apr  3  2019 /var/spool/postfix/pid/master.pid

Curious is certainly the old age of that pid file. Could it be that Postfix has been dead on that server since April 2019?

But even if a pid file exists, a restart of the service should be possible. Let's take a closer look what is causing the error logged in mail.err.

The Postfix master command

By looking for that logged string, we can identify the postfix-script command as source of the log entries (besides that it also logged itself in the log event):

root@ubuntu:~# grep -rni "the Postfix mail system is already running" /usr/lib/postfix/*
/usr/lib/postfix/sbin/postfix-script:135:               $FATAL the Postfix mail system is already running
/usr/lib/postfix/sbin/postfix-script:149:        $FATAL the Postfix mail system is already running

Note: The same file (postfix-script) can also be found in /etc/postfix/.

By looking at one of the mentioned lines (135) shows the following code snippet:

quick-start)

       $daemon_directory/master -t 2>/dev/null || {
               $FATAL the Postfix mail system is already running
               exit 1
       }

Here we are inside the "quick-start" action. It only does one thing: Launch $daemon_directory/master -t and if the return code is not true (or 0 for that matter), return "$FATAL the Postfix mail system is already running".

To run this command manually, we first need to find where this master command is located. For this we need to find the definition of the $daemon_directory variable. Luckily a grep through /etc/postfix/ quickly reveals it:

root@ubuntu:~# grep daemon_directory /etc/postfix/* -rni
/etc/postfix/main.cf.proto:57:# The daemon_directory parameter specifies the location of all Postfix
/etc/postfix/main.cf.proto:61:daemon_directory = /usr/lib/postfix/sbin
[...]

Alright! Now we know daemon_directory is /usr/lib/postfix/sbin. Let's launch the master command from there:

root@ubuntu:/usr/lib/postfix/sbin# ./master -t 2
root@ubuntu:/usr/lib/postfix/sbin# echo $?
1

The return code of that command is definitely not 0. But what does that master command actually do? We can find this information from Postfix's man page on master:

       -t     Test  mode.  Return  a zero exit status when the master.pid lock
              file does not exist
or when that file is not  locked.   This  is
              evidence that the master(8) daemon is not running.

The master -t command is therefore used to check whether or not a lock file exists. Only if no lock file is detected, the command will return 0 - which we need in order to complete the Postfix restart through Systemd.

The master.lock file

Some additional research led to a blog entry from Theresa Arzadon-Labajo revealing:

Another log showed:
postfix/master[1490]: fatal: open lock file /var/lib/postfix/master.lock: unable to set exclusive lock: Resource temporarily unavailable

Let's take a look into /var/lib/postfix:

root@ubuntu:~# ls -la /var/lib/postfix/
total 32
drwxr-xr-x  2 postfix postfix 4096 Mar 14  2019 ./
drwxr-xr-x 50 root    root    4096 Nov 23 12:03 ../
-rw-------  2 postfix postfix   33 Apr  3  2019 master.lock
-rw-------  1 postfix postfix 1024 Apr 25  2019 prng_exch
-rw-------  1 postfix postfix 8192 Apr  3  2019 smtp_tls_session_cache.db
-rw-------  1 postfix postfix 8192 Apr  3  2019 smtpd_scache.db

And there it is: The master.lock file. Conveniently with the exact same date as the previously discovered PID file.

Let's remove the master.lock file and restart Postfix once more:

root@ubuntu:~# rm /var/lib/postfix/master.lock

root@ubuntu:~# ls -la /var/lib/postfix/master.lock
ls: cannot access '/var/lib/postfix/master.lock': No such file or directory

root@ubuntu:~# systemctl restart postfix && tail /var/log/mail.log
Dec  6 19:28:33 ubuntu postfix/postfix-script[743306]: warning: /var/spool/postfix/etc/services and /etc/services differ
Dec  6 19:28:33 ubuntu postfix/postfix-script[743310]: warning: /var/spool/postfix/etc/resolv.conf and /etc/resolv.conf differ
Dec  6 19:28:33 ubuntu postfix/postfix-script[743316]: warning: /var/spool/postfix/lib/x86_64-linux-gnu/libgcc_s.so.1 and /lib/x86_64-linux-gnu/libgcc_s.so.1 differ
Dec  6 19:28:33 ubuntu postfix/postfix-script[743318]: warning: /var/spool/postfix/lib/x86_64-linux-gnu/libnss_systemd.so.2 and /lib/x86_64-linux-gnu/libnss_systemd.so.2 differ
Dec  6 19:28:33 ubuntu postfix/postfix-script[743324]: starting the Postfix mail system

The Postfix process was (finally) started this time and the SMTP listener on port 25 started up:

root@ubuntu:~# netstat -lntup|grep 25
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      743326/master  

TL;DR

If you're unable to restart Postfix and no master processe or smtp listener show up (even though Systemd's Postfix service says active), make sure there is no master.lock file blocking a start of a new Postfix master process.



Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder