Thousands of JSON parsing errors after Logstash upgrade to 7.15

Written by - 0 comments

Published on - last updated on January 3rd 2022 - Listed in Elasticsearch ELK Logstash

For the last couple of weeks, I have been running behind a Logstash issue, where thousands of JSON parsing errors were logged after the whole ELK stack was upgraded to 7.15.

This caused the /var partition to be filled up within a very short time:

/var partition filled with error logs from Logstash

Even after clearing (emptying) /var/log/syslog, it didn't even take 30 minutes, until more than 30GB of errors were written and filling up the partition again.

The local syslog log file (/var/log/syslog) contained thousands of errors, indicating JSON parsing errors from Logstash:

Oct 20 11:17:46 log01 logstash[3628473]: [2021-10-20T11:17:46,964][ERROR][logstash.codecs.json     ][main][6becfe14a36e9ff58e20a97f0820ea946e3122df132f9ddd7c5d61b99c609832] JSON parse error, original data now in message field {:message=>"Unexpected end-of-input: was expecting closing quote for a string value\n at [Source: (String)\"{\"@timestamp\":\"2021-10-20T11:17:46.960783+02:00\",\"@version\":\"1\",\"message\":\" [2021-10-20T11:17:46,960][ERROR][logstash.codecs.json     ][main][6becfe14a36e9ff58e20a97f0820ea946e3122df132f9ddd7c5d61b99c609832] JSON parse error, original data now in message field {:message=>\\\"Unexpected end-of-input in character escape sequence\\\\n at [Source: (String)\\\\\\\"{\\\\\\\"@timestamp\\\\\\\":\\\\\\\"2021-10-20T11:17:46.956975+02:00\\\\\\\",\\\\\\\"@version\\\\\\\":\\\\\\\"1\\\\\\\",\\\\\\\"message\\\\\\\":\\\\\\\" [2021-10-20T11:17:46,955][ERROR][logs\"[truncated 7596 chars]; line: 1, column: 16193]", :exception=>LogStash::Json::ParserError, :data=>"{\"@timestamp\":\"2021-10-20T11:17:46.960783+02:00\",\"@version\":\"1\",\"message\":\" [2021-10-20T11:17:46,960][ERROR][logstash.codecs.json     ][main][6becfe14a36e9ff58e20a97f0820ea946e3122df132f9ddd7c5d61b99c609832] JSON parse error, original data now in message field {:message=>\\\"Unexpected end-of-input in character escape sequence\\\\n at [Source: (String)\\\\\\\"{\\\\\\\"@timestamp\\\\\\\":\\\\\\\"2021-10-20T11:17:46.956975+02:00\\\\\\\",\\\\\\\"@version\\\\\\\":\\\\\\\"1\\\\\\\",\\\\\\\"message\\\\\\\":\\\\\\\" [2021-10-20T11:17:46,955][ERROR][logstash.codecs.json     ][main][6becfe14a36e9ff58e20a97f0820ea946e3122df132f9ddd7c5d61b99c609832] JSON parse error, original data now in message field {:message=>\\\\\\\\\\\\\\\"Unexpected end-of-input: was expecting closing quote for a string value\\\\\\\\\\\\\\\\n at [Source: (String)

Help was requested on the public Elastic discussion forum (find source of events causing Logstash errors), with the goal to identify which application or program would cause these errors - but nobody answered. An additional support ticket with the Elastic support was also created.

After having tried lots of different configuration options, including disabling the "codec => json" setting on the Logstash input or in general disabling JSON parsing (which worked, but resulted in our logs not being split into needed fields anymore), Logstash was finally downgraded to 7.12.1.

And what a change this was! Since Logstash was downgraded from 7.15.2 to 7.12.1, the massive error logging stopped, and the disk usage on the /var partition remained stable (within the expected growth):

But not only the disk usage remained very steady, with only a few occasional Logstash errors being logged, we also saw a sharp decrease of CPU and network usage (towards Elasticsearch):

The graphs all look very good - but are there any downsides? So far we have discovered no downsides at all. We are still getting all the logs (= we are not missing any logs), the events are correctly JSON parsed and split into fields, indexing in Elasticsearch works was it should. Yet, there are no more mass-errors being logged and performance is better (on a large factor).

This leads to the conclusion that between Logstash 7.12.1 and 7.15.x (we have seen this mass error happening on Logstash 7.15.0-2) must be a bug in Logstash itself. What exactly is yet unknown and we hope the Elastic support might find the cause of it. For now we will stick with the 7.12 version.

Looping logs

Updated December 8th, 2021

Further analysis and discussion with Elastic support showed indeed a different logging behaviour between Logstash 7.12 and 7.15. Logstash's JSON codec changed the logging event in case of parsing errors:

  • 7.12:"JSON parse failure. Falling back to plain-text" ... )
  • 7.15: logger.error("JSON parse error, original data now in message field" ... )

An initial JSON parsing error (1) is therefore logged by Logstash to syslog and is written into /var/log/syslog (2). One particular configuration in our environment is that we use rsyslogd (3) to format syslog events to JSON and then forward it to - you guessed it - Logstash (4). By just looking at the logged error message (which contains partial JSON data, too) it doesn't need a JSON genius to guess that Logstash will AGAIN throw a JSON parsing error when the forwarded syslog entry reaches the Logstash input (5).

Yet again, another error is logged in syslog, which is then again picked up by Logstash, which creates another error, which is logged, ... A classical case of a loop. The following drawing should explain this visually:

So right now we are working on a way to prevent Logstash actually writing into syslog - and keep its own dedicated log file. This would prevent the loop to be triggered and only the initial JSON parse error would be logged (which would be fine).

Another alternative workaround would be to drop the Logstash syslog lines as soon as they arrive in the LS input:

filter {
  if "log01" in [sysloghost] and "logstash" in [programname] {
    drop { }

But of course this uses performance, too. 

Disabling logging to syslog

Updated January 3rd 2022

As described in the previous update, the suspected reason was that errors were logged to syslog and then picked up by Logstash again, hence creating a loop. To disable logging to syslog, the following log4j setting needs to be disabled (commented):

root@logstash:~# grep "console.ref" /etc/logstash/
#rootLogger.appenderRef.console.ref = ${sys:ls.log.format}_console # Disabled logging to syslog

After another Logstash restart, all Logstash related logs (including errors) are now logged in a dedicated log file under /var/log/logstash/logstash-plain.log.

This did the trick! Syslog doesn't pick up any Logstash logs anymore and the JSON parse errors are gone. This can be nicely seen in the disk usage of the /var partition:

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder