Handling validity check failed and empty client certificate chain errors in Elasticsearch

Written by - 0 comments

Published on - last updated on March 7th 2022 - Listed in Elasticsearch ELK Monitoring TLS SSL Security


The communication in an Elasticsearch cluster usually happens over the transport port (tcp/9300). With xpack enabled, TLS certificates can be installed and used to encrypt the communication between the nodes.

Here's an example of such a SSL/TLS setup with certificates in PEM format:

root@esnode1:~# cat /etc/elasticsearch/elasticsearch.yml
[...]
#xpack settings
xpack.security.enabled:  true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: certs/private.key
xpack.security.transport.ssl.certificate: certs/certificate.crt
xpack.security.transport.ssl.certificate_authorities: [ "certs/chain.crt" ]

Handling an expired certificate

When a certificate expires, the node(s) are unable to communicate with the node on which the certificate has expired. The logs will contain a lot of entries, mentioning validity check failed. Somewhere in the middle of the logs you might also spot the reason for it: An expired validity date mentioned by java.security.cert.CertificateExpiredException.

[2022-03-01T07:49:23,116][WARN ][o.e.d.PeerFinder         ] [esnode3] address [192.168.22.51:9300], node [null], requesting [false] connection failed: [][192.168.22.51:9300] general node connection failure: handshake failed because connection reset
[2022-03-01T07:49:23,116][WARN ][o.e.t.TcpTransport       ] [esnode3] exception caught on transport layer [Netty4TcpChannel{localAddress=/192.168.22.53:44304, remoteAddress=192.168.22.51/192.168.22.51:9300, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
[...]
Caused by: java.security.cert.CertPathValidatorException: validity check failed
[...]
Caused by: java.security.cert.CertificateExpiredException: NotAfter: Mon Jan 17 00:59:59 CET 2022
[...]

The validity of the installed certificate can easily be monitored, for example with the monitoring plugin check_ssl_cert, targeting the transport port (9300):

ck@monitoring:~$ /usr/lib/nagios/plugins/check_ssl_cert -H esnode1 -p 9300
SSL_CERT CRITICAL *.example.com: x509 certificate element 1 is expired (was valid until Jan 16 23:59:59 2022 GMT)|days_chain_elem1=-43;20;15;;

Obviously the certificates (and maybe key) in the "certs" directory need to be replaced.

SSLHandshakeException: Empty client certificate chain

A more confusing error message in the logs is the empty client certificate chain error:

[2022-03-01T09:42:57,519][WARN ][o.e.t.TcpTransport       ] [esnode3] exception caught on transport layer [Netty4TcpChannel{localAddress=/192.168.22.53:9300, remoteAddress=/192.168.15.31:47298, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Empty client certificate chain
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:477) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[netty-codec-4.1.66.Final.jar:4.1.66.Final]
[...]
Caused by: javax.net.ssl.SSLHandshakeException: Empty client certificate chain

Why is this error message confusing? Because one would assume that the problem is happening on the client side - yet it's the server certificate chain which is not correctly installed.

This can again be verified by using check_ssl_cert:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_ssl_cert -H esnode1 -p 9300
SSL_CERT CRITICAL *.example.com: Cannot verify certificate: unable to get local issuer certificate, unable to verify the first certificate|days_chain_elem1=321;20;15;; 

The check only received one certificate (the server certificate) back, without the chain. 

This is certainly not expected, as the chain is also configured, using the xpack.security.transport.ssl.certificate_authorities setting:

root@esnode2:~# cat /etc/elasticsearch/elasticsearch.yml | grep certificate
xpack.security.transport.ssl.key: certs/private.key
xpack.security.transport.ssl.certificate: certs/certificate.crt
xpack.security.transport.ssl.certificate_authorities: [ "certs/chain.crt" ]

More research leads to Elasticsearch issue #31725 and the following comment:

Indeed the xpack.security.http.ssl.certificate setting should contain a chain.

The renewed certs/certificate.crt file only contained the server certificate in this case. By appending the chain into certificate.crt we can create a full chain:

root@esnode2:/etc/elasticsearch/certs# cat chain.crt >> certificate.crt

Right after this, Elasticsearch should automatically discover a change in the SSL file (Elasticsearch restart not required) and the following message should show up in the logs:

[2022-03-01T12:10:25,276][INFO ][o.e.x.c.s.SSLConfigurationReloader] [inf-elkesi01-p] reloaded [/etc/elasticsearch/certs/certificate.crt] and updated ssl contexts using this file

This fixes the empty client certificate chain errors and monitoring is happy, too:

ck@monitoring:~$ /usr/lib/nagios/plugins/check_ssl_cert -H esnode2 -p 9300
SSL_CERT OK - x509 certificate '*.example.com' from 'Gandi Standard SSL CA 2' valid until Jan 16 23:59:59 2023 GMT (expires in 321 days)|days_chain_elem1=321;20;15;; days_chain_elem2=925;20;15;; days_chain_elem3=5802;20;15;;

For users knowing SSL configurations with a dedicated chain/CA file (e.g. Apache web server), such as me, the certificate_authorities setting is pretty confusing.

TL;DR: xpack.security.transport.ssl.certificate must be full-chain certificate for a correct installation.

Requests from non cluster nodes: empty client certificate chain

Even though the full certificate chain is fixed, the log events (Caused by: javax.net.ssl.SSLHandshakeException: Empty client certificate chain) can still show up. This is the case when non-cluster nodes access the transport port 9300, for example a monitoring server using the check_ssl_cert plugin. As the request to port 9300 does not include a client certificate, the error is actually correct but can be ignored.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.