Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

Nginx: Serve error pages from reverse proxy, not from upstream server
Thursday - Sep 21st 2017 - by - (0 comments)

If you run Nginx as a reverse proxy and you want to serve error pages from the reverse proxy itself, don't forget to set the following proxy setting:

proxy_intercept_errors on;

Without this, Nginx will forward the error page coming from the upstream server to the client.

If you want certain error pages still being delivered from the upstream server (for example 404), then simply don't specify the error_page 404 on the reverse proxy:

  error_page 400 /400.html;
  location /400.html {
    root   /var/www/errorpages;
    internal;
  }

  error_page 500 /500.html;
  location /500.html {
    root   /var/www/errorpages;
    internal;
  }

  error_page 502 /502.html;
  location /502.html {
    root   /var/www/errorpages;
    internal;
  }

  error_page 503 /503.html;
  location /503.html {
    root   /var/www/errorpages;
    internal;
  }

  error_page 504 /504.html;
  location /504.html {
    root   /var/www/errorpages;
    internal;
  }

 

Varnish panic / crash due to low workspace sizing
Monday - Sep 18th 2017 - by - (0 comments)

Last week I experienced several Varnish panics/crashes on a high-traffic website. Before this Varnish was taken into production, stress-tests (with up to 3200 concurrent connections) were all successful. However as soon as this Varnish got into production, it took around 20 minutes and the first crash happened.
The log entry was unfortunately not very helpful:

Child (6970) Panic at: Tue, 12 Sep 2017 18:23:34 GMT#012Assert error in VEP_Init(), cache/cache_esi_parse.c line 1033:#012  Condition((vep) != 0) not true.#012version = varnish-5.1.3 revision 05c5ac6b9, vrt api = 6.0#012ident = Linux,4.4.0-89-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll#012now = 3055882.671440 (mono), 1505240579.638393 (real)#012Backtrace:#012  0x438c25: /usr/sbin/varnishd() [0x438c25]#012  0x4256c2: /usr/sbin/varnishd(VEP_Init+0x172) [0x4256c2]#012  0x42324b: /usr/sbin/varnishd() [0x42324b]#012  0x42a2f5: /usr/sbin/varnishd(VFP_Open+0x85) [0x42a2f5]#012  0x427811: /usr/sbin/varnishd() [0x427811]#012  0x4520a2: /usr/sbin/varnishd() [0x4520a2]#012  0x45254b: /usr/sbin/varnishd() [0x45254b]#012  0x7feff213a6ba: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7feff213a6ba]#012  0x7feff1e703dd: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7feff1e703dd]#012thread = (cache-worker)#012thr.req = (nil) {#012},#012thr.busyobj = 0x7fefb779f020 {#012  ws = 0x7fefb779f0a0 {#012    OVERFLOWED id = \"Bo\",#012    {s, f, r, e} = {0x7fefb77a0f60, +57472, (nil), +57496},#012  },#012  retries = 1, failed = 0, flags = {do_esi, is_gzip},#012  http_conn = 0x7fefb77a8b30 {#012    fd = 137 (@0x7fefcd3887f4),#012    doclose = NULL,#012    ws = 0x7fefb779f0a0 {#012      [Already dumped, see above]#012    },#012    {rxbuf_b, rxbuf_e} = {0x7fefb77a8b90, 0x7fefb77a8eb4},#012    {pipeline_b, pipeline_e} = {0x7fefb77a8eb4, 0x7fefb77aeef0},#012    content_length = -1,#012    body_status = chunked,#012    first_byte_timeout = 60.000000,#012    between_bytes_timeout = 60.000000,#012  },#012  filters = ESI_GZIP=0 GUNZIP=0#012  director_req = 0x7fefcf76e2f8 {#012    vcl_name = niobe,#012    type = backend {#012      display_name = reload_20170821_100741.niobe,#012      ipv4 = 127.0.0.1,#012      port = 8080,#012      hosthdr = 127.0.0.1,#012      health = healthy,#012      admin_health = probe, changed = 1505067128.961975,#012      n_conn = 82,#012    },#012  },#012  director_resp = director_req,#012  http[bereq] = 0x7fefb779f670 {#012    ws = 0x7fefb779f0a0 {#012      [Already dumped, see above]
[...]

After a restart, the next crash happened just a few minutes later. Because this was a going-live, there was not much time to find the cause and I switched the traffic to a standby machine on which I installed Varnish 4.1.
Note: As of this writing in September 2017, Varnish 4.1 is considered the "stable" version while 5.1 is the "latest" version.
But even with Varnish 4.1 there was a crash, although it took much longer:

Child (1826) Last panic at: Tue, 12 Sep 2017 20:41:32 GMT#012"Assert error in http_EstimateWS(), cache/cache_http.c line 793:#012  Condition((fm->hd[u].b) != 0) not true.#012thread = (cache-worker)#012version = varnish-4.1.8 revision d266ac5c6#012ident = Linux,4.4.0-89-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll#012now = 2968968.394245 (mono), 1505248890.997950 (real)#012Backtrace:#012  0x434122: pan_ic+0x182#012  0x42d4f7: http_EstimateWS+0x3e7#012  0x422a81: vbf_beresp2obj+0x51#012  0x4233ff: vbf_fetch_thread+0x37f#012  0x44a5d9: WRK_Thread+0x4c9#012  0x44aa4b: pool_thread+0x2b#012  0x7f8f0038d6ba: libpthread.so.0(+0x76ba) [0x7f8f0038d6ba]#012  0x7f8f000c33dd: libc.so.6(clone+0x6d) [0x7f8f000c33dd]#012busyobj = 0x7f8ee7124020 {#012  ws = 0x7f8ee71240e0 {#012    OVERFLOWED id = \"Bo\",#012    {s,f,r,e} = {0x7f8ee7125f98,+57440,(nil),+57440},#012  },#012  refcnt = 2,#012  retries = 1, failed = 1, state = 1,#012  flags = {do_esi, is_gzip},#012  director_req = 0x7f8eff553ab8 {#012    vcl_name = niobe,#012    type = backend {#012      display_name = boot.niobe,#012      ipv4 = 127.0.0.1,#012      port = 8080,#012      hosthdr = 127.0.0.1,#012      health=healthy, admin_health=probe, changed=1505248796.1,#012      n_conn = 34,#012    },#012  },#012  director_resp = director_req,#012  http[bereq] = 0x7f8ee71246a8 {#012    ws[Bo] = 0x7f8ee71240e0,#012

Here the error message is different (Assert error in http_EstimateWS) than before. But at least Varnish 4.1 behaved much more stable and didn't crash every few minutes like Varnish 5.1. This finally gave me the time to troubleshoot.

On my research I came across an old bug (trac ticket 1352, meanwhile migrated into Github issue 1352) from 2013 which wasn't exactly the same error but it looked kind of similar due to the cache_esi_parse file and vep condition:

Child (22377) Panic message: Assert error in VEP_Init(), cache_esi_parse.c line 1001:
Condition((sp->wrk->vep) != 0) not true.
thread = (cache-worker)

The ticket was closed back then as "invalid" with a comment, to increase the sess_workspace:

"This assert is caused by the session workspace running out. When doing ESI deliveries, the session workspace usage will increase, especially when doing nested ESI includes. Increase the sess_workspace runtime parameter.
Regards, Martin Blix Grydeland"

Martin Blix Grydeland is a senior developer at Varnish. So if he pinpoints this error message to the session workspace, I believe him (who else is able to understand the panic message anyway?).

Whlie researching "sess_workspace" and its usage, I came across a helpful article (Varnish "sess_workspace" and why it is important) by Cosimo Streppone. It basically explains what this parameter means and how to set it during Varnish startup (-p sess_workspace=N).
This description also matches the environment of my Varnish setup:

"We found out we had to increase the default (16kb), especially since we're doing quite a bit of HTTP header copying and rewriting around."

Indeed the website running through my panicked Varnish also has a lot more HTTP headers (and rewrites) than a usual website. 

I finally thought this could be the cause, however the sess_workspace parameter does not exist anymore in Varnish 5.1. As by the upgrade notes of Varnish 4.0:

sess_workspace

In 3.0 it was often necessary to increase sess_workspace if a lot of VMODs, complex header operations or ESI were in use.
This is no longer necessary, because ESI scratch space happens elsewhere in 4.0.
If you are using a lot of VMODs, you may need to increase either workspace_backend and workspace_client based on where your VMOD is doing its work.

So instead I have to increase two parameters: workspace_backend and workspace_client. For this I adapted the Varnish's SystemD unit file:

claudio@varnish02:~$ cat /etc/systemd/system/varnish.service  | grep ExecStart
ExecStart=/usr/sbin/varnishd -a :6081 -T localhost:6082 -f /etc/varnish/niobe.vcl -S /etc/varnish/secret -s malloc,20480m -p workspace_client=256k -p workspace_backend=256k

This worked fine on Varnish 4.1 and after several hours without any panic/crash I decided to try the same with Varnish 5.1, too.
Once I adapted these parameters, followed by a systemctl daemon-reload and restart of Varnish, I haven't seen any Varnish panics/crashes anymore for several days.

 

Linux Mint Dual Monitor Setup: Screens dark after login in Cinnamon
Friday - Sep 15th 2017 - by - (0 comments)

Re-organized my workspace today and switched the order of the two monitoring screens. After I attached my notebook, running with Linux Mint 17.3, to the docking station, connected to two physical screens, everything looked as it should - at first.

The logon screen appeared and I selected my user and logged myself in. Then both screens went dark. Funny part is when I plugged the screens the old way (switched DVI port 1 with port 2) the screens re-appeared.

When I de-tached the notebook from the docking station and rebooted it, I was able to logon and saw my cinnamon screen. Then I plugged it into the docking station again and now at least the primary screen got enabled - but the right-hand side screen kept dark.

Finally I was able to get both screens working again AND keeping the primary screen as I wanted it by using xrandr.

xrandr without options shows what screens are detected and usable:

$ xrandr
Screen 0: minimum 8 x 8, current 3840 x 1200, maximum 32767 x 32767
eDP1 connected (normal left inverted right x axis y axis)
   1920x1080      60.0 +   59.9 
   1680x1050      60.0     59.9 
   1600x1024      60.2 
   1400x1050      60.0 
   1280x1024      60.0 
   1440x900       59.9 
   1280x960       60.0 
   1360x768       59.8     60.0 
   1152x864       60.0 
   1024x768       60.0 
   800x600        60.3     56.2 
   640x480        59.9 
DP1 disconnected (normal left inverted right x axis y axis)
DP1-1 connected primary 1920x1200+0+0 (normal left inverted right x axis y axis) 518mm x 324mm
   1920x1200      60.0*+
   1920x1080      60.0 
   1600x1200      60.0 
   1680x1050      59.9 
   1280x1024      60.0 
   1280x960       60.0 
   1024x768       60.0 
   800x600        60.3 
   640x480        60.0 
   720x400        70.1 
DP1-2 connected 1920x1200+1920+0 (normal left inverted right x axis y axis) 518mm x 324mm
   1920x1200      60.0 +
   1920x1080      60.0 
   1600x1200      60.0 
   1680x1050      59.9 
   1280x1024      60.0 
   1280x960       60.0 
   1024x768       60.0 
   800x600        60.3 
   640x480        60.0 
   720x400        70.1 
DP1-3 disconnected (normal left inverted right x axis y axis)
HDMI1 disconnected (normal left inverted right x axis y axis)
HDMI2 disconnected (normal left inverted right x axis y axis)
VIRTUAL1 disconnected (normal left inverted right x axis y axis)

eDP1 is the notebook's own screen
DP1-1 is the first DVI port on which I plugged my left screen which I want to be primary. Note the asterisk (*) behind the 60 hertz rate. This means this screen is enabled.
DP1-2 is the second DVI port on which I plugged my right screen. The screen is obviously disabled (missing asterisk).

So now that I knew the "names" of the interfaces, I could tell xrandr which interface to enable or disable and with which hertz rate and resolution etc. This is the command I used:

$ xrandr --output DP1-1 --auto --primary --rotate normal --output DP1-2 --auto --right-of DP1-1 --output eDP1 --off

And the second (right-hand) screen switched on!

Now in order that Linux Mint uses these display settings at the next logon, I opened "Display" and clicked on Apply - followed by "Keep this configuration".

Linux Mint Cinnamon Display Settings 

After a reboot (the notebook's lid still closed and plugged into the docking station), both screens were working again as they should.

 

Increase JVM Heap Size Memory for ElasticSearch
Thursday - Sep 14th 2017 - by - (0 comments)

After increasing the usage of the ELK stack in the last few days, I noticed more and more lags and even timeouts in searches. I found the bottleneck in Elasticsearch itself: The available memory for ElasticSearch was limited to 2GB (probably the default).

Elasticsearch memory

To increase the JVM Heap Size for ElasticSearch (remember, ES is a Java application), the preferred way to define the heap sizes is to use the jvm.options file. This file is located in ELASTICSEARCH_CONF_DIR, in a standard installation this means in /etc/elasticsearch/:

root@elkes01:~# cat /etc/elasticsearch/jvm.options |egrep "(Xms|Xmx)"
## -Xms4g
## -Xmx4g
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms12g
-Xmx12g

 Important: Both initial (Xms) and maximum (Xmx) sizes must be the same value, or you get an error like this and ES won't start:

[1]: initial heap size [4294967296] not equal to maximum heap size [12884901888]; this can cause resize pauses and prevents mlockall from locking the entire heap

After adapting the Heap Size values, ElasticSearch must be restarted and if you follow the log file, it will confirm the new values:

[2017-09-14T09:28:00,021][INFO ][o.e.n.Node               ] [elkes01] initializing ...
[2017-09-14T09:28:00,118][INFO ][o.e.e.NodeEnvironment    ] [elkes01] using [1] data paths, mounts [[/var/lib/elasticsearch (/dev/mapper/vges-lves)]], net usable_space [819.5gb], net total_space [1007.8gb], spins? [possibly], types [ext4]
[2017-09-14T09:28:00,118][INFO ][o.e.e.NodeEnvironment    ] [elkes01] heap size [11.9gb], compressed ordinary object pointers [true]
[2017-09-14T09:28:00,531][INFO ][o.e.n.Node               ] [elkes01] node name [elkes01], node ID [t3GAvhY1SS2xZkt4U389jw]
[2017-09-14T09:28:00,531][INFO ][o.e.n.Node               ] [elkes01] version[5.6.0], pid[9390], build[781a835/2017-09-07T03:09:58.087Z], OS[Linux/4.4.0-83-generic/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_144/25.144-b01]
[2017-09-14T09:28:00,531][INFO ][o.e.n.Node               ] [elkes01] JVM arguments [-Xms4g, -Xmx12g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/share/elasticsearch]
[2017-09-14T09:28:01,212][INFO ][o.e.p.PluginsService     ] [elkes01] loaded module [aggs-matrix-stats]
[2017-09-14T09:28:01,212][INFO ][o.e.p.PluginsService     ] [elkes01] loaded module [ingest-common]

Searches and also visual graphics (which at the end use seaches, too) are now much faster in Kibana!

 

Analysing Varnish traffic with varnishtop and varnishlog
Wednesday - Sep 13th 2017 - by - (0 comments)

In order to analyze what traffic goes through Varnish and what Varnish exactly does with it, there are mainly two commands available to help you.

varnishtop

The first one is varnishtop which, as the name says (like the "top" command) a real time monitor of the Varnish traffic.

root@varnish:~# varnishtop

list length 4916

 28630.07 ReqHeader      Accept-Encoding: gzip
 21650.78 VCL_return     deliver
 18189.85 ReqProtocol    HTTP/1.1
 10715.35 ReqUnset       Accept-Encoding: gzip
 10682.41 RespProtocol   HTTP/1.1
 10657.09 VCL_call       HASH
 10657.09 VCL_call       RECV
 10657.09 VCL_call       DELIVER
 10657.09 VCL_return     lookup
 10657.09 ReqHeader      X-UA-Device: pc
 10657.09 RespHeader     Via: 1.1 varnish-v4
 10657.09 ReqHeader      Connection: close
 10655.04 RespHeader     Server: nginx/1.10.3 (Ubuntu)
 10649.17 ReqHeader      Host: www.example.com
 10648.83 VCL_return     hash
 10648.83 ReqHeader      Surrogate-Capability: abc=ESI/1.0
 10628.76 RespHeader     X-Forwarded-For:
 10628.76 RespHeader     X-Varnish-Hostname: varnish
 10628.76 RespHeader     Cache-Control: private, max-age=0
 10616.07 ReqMethod      GET
 10596.65 ReqHeader      X-Forwarded-Proto: http
 10563.52 RespHeader     X-Frame-Options: SAMEORIGIN
 10379.00 RespStatus     200
 10379.00 RespReason     OK
 10353.67 RespHeader     Accept-Ranges: bytes
 10219.76 RespHeader     Content-Type: text/html; charset=UTF-8
 10182.13 ReqHeader      Authorization: Basic bmVvOmtsRXdySV9mbzE=
 10180.63 RespHeader     Vary: Accept-Encoding
 10156.50 RespHeader     Content-Encoding: gzip
  9103.57 VCL_call       HIT
  9103.57 RespHeader     X-Cache: HIT
  8730.37 ReqHeader      X-Grace-Debug: hit
  8730.37 RespHeader     X-Grace-Debug: hit
  4739.76 ReqUnset       X-UA-Device: pc
  3984.83 ReqUnset       Accept-Encoding: gzip, deflate
  3597.83 RespHeader     Date: Wed, 13 Sep 2017 11:24:44 GMT
  3452.46 VCL_return     fetch
  2934.02 ReqHeader      Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  2778.76 ReqHeader      Accept: */*
[...]

It basically counts up the number of times a function (like ReqHeader) was used and sorts it accordingly.

Varnishtop is, like top, an interactive way to see what's going on. To use a non-interactive way (for example running a regular cronjob), the output can be saved in a file:

root@varnish:~# varnishtop -1 -c > /tmp/xxx

Here I told varnishtop to run once (-1) and display only records concerning the client/requester (-c). This works a bit differently because here the full stats (Varnish keeps a kind of log file in memory) are shown (no counting up slowly as in interactive mode):

root@varnish:~# head /tmp/xxx
 50255.00 ReqHeader Accept-Encoding: gzip
 35081.00 VCL_return deliver
 31935.00 ReqProtocol HTTP/1.1
 19200.00 ReqUnset Accept-Encoding: gzip
 19020.00 RespProtocol HTTP/1.1
 18957.00 VCL_call HASH
 18957.00 VCL_call RECV
 18957.00 VCL_call DELIVER
 18957.00 VCL_return lookup
 18957.00 ReqHeader X-UA-Device: pc

Varnishtop can also use filtering. For example if one wants to see a live view of the most requested URL's:

root@varnish:~# varnishtop -i ReqURL

list length 15
     8.17 ReqURL         /
     3.67 ReqURL         /rss.xml
     3.67 ReqURL         /international.rss
     2.67 ReqURL         /blog/733/solving-grokparsefailure-syslog-messages-entries-elk-logstash
     2.67 ReqURL         /monitoring-plugins/
[...]

Or a live view of the client browsers language settings:

root@varnish:~# varnishtop -I ReqHeader:Accept-Language

list length 60

   324.39 ReqHeader      Accept-Language: de-ch
   189.76 ReqHeader      Accept-Language: de,en-US;q=0.7,en;q=0.3
   161.48 ReqHeader      Accept-Language: de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4
   108.58 ReqHeader      Accept-Language: en-US
    95.21 ReqHeader      Accept-Language: de
    79.94 ReqHeader      Accept-Language: de-de
    74.94 ReqHeader      Accept-Language: en-us
    67.58 ReqHeader      Accept-Language: de-CH
    64.30 ReqHeader      Accept-Language: en-US,en;q=0.5
    52.21 ReqHeader      Accept-Language: en-US,en;q=0.8
    36.52 ReqHeader      Accept-Language: en-gb
    23.39 ReqHeader      Accept-Language: de-DE,de,en-US,en
    17.03 ReqHeader      Accept-Language: de-CH,de-DE;q=0.8,de;q=0.6,en-US;q=0.4,en;q=0.2
    16.85 ReqHeader      Accept-Language: en-US,en;q=0.8,de;q=0.6
    16.67 ReqHeader      Accept-Language: de-DE
    15.45 ReqHeader      Accept-Language: de-DE,en-US;q=0.8
    14.61 ReqHeader      Accept-Language: en
    13.67 ReqHeader      Accept-Language: de-CH,de;q=0.8,fr-CH;q=0.7,fr;q=0.5,it-CH;q=0.3,it;q=0.2
    13.00 ReqHeader      Accept-Language: de-CH,de-DE;q=0.8,de;q=0.6,fr-CH;q=0.4,fr;q=0.2
[...]

Note the difference between minor -i and capital -I. If a certain field (in this case Accept-Language) and not just the function (ReqHeader) should be filtered, regular expression (-I) must be applied.

Or to show User-Agents containing the string "Googlebot":

root@varnish:~# varnishtop -I ReqHeader:User-Agent.*Googlebot

list length 5

   190.59 ReqHeader  User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; startmebot/1.0; +https://start.me/bot)
   119.98 ReqHeader  User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
   107.32 ReqHeader  User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    21.12 ReqHeader  User-Agent: SentiBot www.sentibot.eu (compatible with Googlebot)
     1.77 ReqHeader  User-Agent: Googlebot-News

Full reference: https://varnish-cache.org/docs/trunk/reference/varnishtop.html

 

varnishlog

While varnishtop is pretty handy to have a sorted "live" or "monitoring" view of what's going on, it is not very helpful for troubleshooting, like traffic analysis. For this purpose there's varnishlog. If you simply call varnishlog, it will show you a seemingly endless list (if you have high traffic) of the whole HTTP traffic going through Varnish. To be honest, not helpful to troubleshoot if you see the whole traffic from every client. But varnishlog (and varnishtop, too) support so-called VSL queries. With such a query, specific traffic can be analyzed.

Here I analyze traffic coming from my own IP address (10.10.40.40):

root@varnish:~# varnishlog -q "ReqHeader ~ 'X-Forwarded-For: 10.10.40.40'"

Note 1: Varnish runs behind a Reverse Proxy, therefore I'm using the X-Forwarded-For header to find traffic for my client IP.

Note 2: Here I'm using the hyphen (~) operator to make a regular expression query. As you can see in the result below the full request header was "X-Forwarded-For: 10.10.40.40, 127.0.0.1".

On my machine I launch a curl command:

$ curl https://www.example.com/_version

And varnishlog shows me my request:

*   << Request  >> 1739776  
-   Begin          req 1739775 rxreq
-   Timestamp      Start: 1505304166.030111 0.000000 0.000000
-   Timestamp      Req: 1505304166.030111 0.000000 0.000000
-   ReqStart       127.0.0.1 42206
-   ReqMethod      GET
-   ReqURL         /_version
-   ReqProtocol    HTTP/1.0
-   ReqHeader      Host: www.example.com
-   ReqHeader      X-Real-IP: 10.10.40.40
-   ReqHeader      X-Forwarded-For: 10.10.40.40
-   ReqHeader      Connection: close
-   ReqHeader      User-Agent: curl/7.35.0
-   ReqHeader      Accept: */*
-   ReqUnset       X-Forwarded-For: 10.10.40.40
-   ReqHeader      X-Forwarded-For: 10.10.40.40, 127.0.0.1
-   VCL_call       RECV
-   ReqHeader      X-UA-Device: pc
-   ReqHeader      Surrogate-Capability: abc=ESI/1.0
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Debug          "XXXX HIT-FOR-PASS"
-   HitPass        237645
-   VCL_call       PASS
-   VCL_return     fetch
-   Link           bereq 1739777 pass
-   Timestamp      Fetch: 1505304166.052654 0.022543 0.022543
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     Server: nginx/1.10.3 (Ubuntu)
-   RespHeader     Content-Type: application/json
-   RespHeader     Vary: Accept-Encoding
-   RespHeader     Cache-Control: private, must-revalidate
-   RespHeader     Date: Wed, 13 Sep 2017 12:02:46 GMT
-   RespHeader     pragma: no-cache
-   RespHeader     expires: -1
-   RespHeader     X-Frame-Options: SAMEORIGIN
-   RespHeader     X-Varnish: 1739776
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespHeader     X-Cache: MISS
-   RespHeader     X-Grace-Debug:
-   VCL_return     deliver
-   Timestamp      Process: 1505304166.052666 0.022555 0.000012
-   RespHeader     Accept-Ranges: bytes
-   Debug          "RES_MODE 4"
-   RespHeader     Connection: close
-   Timestamp      Resp: 1505304166.052688 0.022577 0.000022
-   ReqAcct        163 0 163 365 135 500
-   End     

The full HTTP traffic is nicely presented from the begin until the end including detailed timestamps.

The query can also be adjusted to use several fields at the same time. In this example I want to see non-cached traffic (X-Cache: MISS) which don't receive a HTTP 200 status as response:

root@varnish:~# varnishlog -q "RespHeader eq 'X-Cache: MISS' and RespStatus != 200"
*   << Request  >> 2747891  
-   Begin          req 2747890 rxreq
-   Timestamp      Start: 1505305067.468814 0.000000 0.000000
-   Timestamp      Req: 1505305067.468814 0.000000 0.000000
-   ReqStart       127.0.0.1 37518
-   ReqMethod      GET
-   ReqURL         /app1/rest/api/v1/run?ID=1111
-   ReqProtocol    HTTP/1.0
-   ReqHeader      Host: www.example.com
-   ReqHeader      X-Real-IP: 10.10.55.55, 127.0.0.1
-   ReqHeader      X-Forwarded-For: 10.10.55.55, 127.0.0.1
-   ReqHeader      Connection: close
-   ReqHeader      X-Forwarded-Proto: http
-   ReqHeader      Authorization: Basic bmVvOmtsRXdySV9mbzE=
-   ReqHeader      C1-Client-Country-Code: US
-   ReqHeader      User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
-   ReqHeader      Accept-Encoding: gzip,deflate
-   ReqHeader      C1-Tracking-ID: 1578426766423744071
-   ReqUnset       X-Forwarded-For: 10.10.55.55, 127.0.0.1
-   ReqHeader      X-Forwarded-For: 10.10.55.55, 127.0.0.1, 127.0.0.1
-   VCL_call       RECV
-   ReqUnset       Accept-Encoding: gzip,deflate
-   ReqHeader      Accept-Encoding: gzip
-   ReqHeader      X-UA-Device: pc
-   ReqHeader      Surrogate-Capability: abc=ESI/1.0
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       MISS
-   VCL_return     fetch
-   Link           bereq 2747892 fetch
-   Timestamp      Fetch: 1505305067.494414 0.025599 0.025599
-   RespProtocol   HTTP/1.1
-   RespStatus     301
-   RespReason     Moved Permanently
-   RespHeader     Server: nginx/1.10.3 (Ubuntu)
-   RespHeader     Content-Type: text/html; charset=UTF-8
-   RespHeader     Cache-Control: no-cache
-   RespHeader     Date: Wed, 13 Sep 2017 12:17:47 GMT
-   RespHeader     Location: http://www.example.com/app1/rest/api/v1/run?ID=1111
-   RespHeader     X-Frame-Options: SAMEORIGIN
-   RespHeader     X-Varnish: 2747891
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespHeader     X-Cache: MISS
-   RespHeader     X-Grace-Debug:
-   VCL_return     deliver
-   Timestamp      Process: 1505305067.494428 0.025613 0.000014
-   RespHeader     Content-Length: 495
-   Debug          "RES_MODE 2"
-   RespHeader     Connection: close
-   Timestamp      Resp: 1505305067.494519 0.025704 0.000091
-   ReqAcct        500 0 500 404 495 899
-   End           

*   << Request  >> 4490790  
-   Begin          req 4490789 rxreq
-   Timestamp      Start: 1505305068.432932 0.000000 0.000000
-   Timestamp      Req: 1505305068.432932 0.000000 0.000000
-   ReqStart       127.0.0.1 37746
-   ReqMethod      POST
-   ReqURL         /autodiscover/autodiscover.xml
-   ReqProtocol    HTTP/1.0
-   ReqHeader      Host: www.example.com
-   ReqHeader      X-Real-IP: 10.128.50.142, 127.0.0.1
-   ReqHeader      X-Forwarded-For: 10.128.50.142, 127.0.0.1
-   ReqHeader      Connection: close
-   ReqHeader      Content-Length: 389
-   ReqHeader      X-Forwarded-Proto: http
-   ReqHeader      Authorization: Basic bmVvOmtsRXdySV9mbzE=
-   ReqHeader      Cache-Control: no-cache
-   ReqHeader      Pragma: no-cache
-   ReqHeader      Content-Type: text/xml
-   ReqHeader      User-Agent: Microsoft Office/16.0 (Windows NT 6.1; Microsoft Outlook 16.0.4549; Pro)
-   ReqHeader      X-MS-CookieUri-Requested: t
-   ReqHeader      X-FeatureVersion: 1
-   ReqHeader      Client-Request-Id: {9DB2AEBC-D17F-4640-8E3D-29035F67A0F3}
-   ReqHeader      X-MapiHttpCapability: 1
-   ReqHeader      Depth: 0
-   ReqHeader      Cookie: OutlookSession="{638F5EED-5A90-4688-8DE0-032340E18CEA}"; creid=1578407533870507445
-   ReqHeader      C1-Tracking-ID: 1578407533870507445
-   ReqUnset       X-Forwarded-For: 10.128.50.142, 127.0.0.1
-   ReqHeader      X-Forwarded-For: 10.128.50.142, 127.0.0.1, 127.0.0.1
-   VCL_call       RECV
-   ReqHeader      X-UA-Device: pc
-   VCL_return     pass
-   VCL_call       HASH
-   VCL_return     lookup
-   VCL_call       PASS
-   VCL_return     fetch
-   Link           bereq 4490791 pass
-   Timestamp      ReqBody: 1505305068.433025 0.000092 0.000092
-   Timestamp      Fetch: 1505305068.570216 0.137284 0.137192
-   RespProtocol   HTTP/1.1
-   RespStatus     404
-   RespReason     Not Found
-   RespHeader     Server: nginx/1.10.3 (Ubuntu)
-   RespHeader     Content-Type: text/html; charset=UTF-8
-   RespHeader     Vary: Accept-Encoding
-   RespHeader     Date: Wed, 13 Sep 2017 12:17:48 GMT
-   RespHeader     Cache-Control: private, max-age=0
-   RespHeader     X-Forwarded-For:
-   RespHeader     X-Varnish-Hostname: varnish
-   RespHeader     X-Varnish: 4490790
-   RespHeader     Age: 0
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   RespHeader     X-Cache: MISS
-   RespHeader     X-Grace-Debug:
-   VCL_return     deliver
-   Timestamp      Process: 1505305068.570234 0.137301 0.000017
-   Debug          "RES_MODE 4"
-   RespHeader     Connection: close
-   Timestamp      Resp: 1505305068.585268 0.152336 0.015034
-   ReqAcct        828 389 1217 343 178556 178899
-   End           

During the time I interactively ran the command, two results showed up (a 301 and a 404).

Full reference: https://varnish-cache.org/docs/4.1/reference/varnishlog.html?highlight=varnishlog

 

Using these two commands, Varnish's traffic can be analyzed properly. Especially by using the filtering and querying techniques.

 

Solving _grokparsefailure on syslog entries in ELK stack
Tuesday - Sep 12th 2017 - by - (0 comments)

In my ELK setup there are currently two ways of receiving and centralizing logs from clients:

- A Logstash listener on udp/10514 receiving syslog logs from Linux machines (in json format)
- A Logstash listener for Filebeat on tcp/5044 receiving logs already parsed and json-formatted by the Filebeat daemon running on Linux machines

In a recent post I wrote about adding HAProxy logs into the ELK stack (see Handling different timezones of HAProxy logs in ELK stack). I found out that since I created the grok filter for HAProxy in /etc/logstash/conf.d/13-haproxy-filter.conf, I got _grokparsefailure tags on all syslog entries. But why? Let's take a look at the file again:

root@logstash:~# cat /etc/logstash/conf.d/13-haproxy-filter.conf
filter {

    grok {
    match => [
    "message", "%{IP:client_ip}:%{NUMBER:client_port:int} \[%{NOTSPACE:haproxy_timestamp}] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:time_queue:int}/%{NUMBER:time_backend_connect:int}/%{NUMBER:time_duration:int} %{NUMBER:bytes_read:int} %{NOTSPACE:termination_state} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int}" ,
    "message" , "%{IP:client_ip}:%{NUMBER:client_port:int} \[%{NOTSPACE:haproxy_timestamp}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:time_client_req:int}/%{NUMBER:time_queue:int}/%{NUMBER:time_backend_connect:int}/%{NUMBER:time_server_response:int}/%{NUMBER:time_duration:int} %{NUMBER:status_code:int} %{NUMBER:bytes_read:int} %{NOTSPACE:captured_request_cookie} %{NOTSPACE:captured_response_cookie} %{NOTSPACE:termination_state_with_cookie_status} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int}?( \"%{GREEDYDATA:full_http_request}\")?( %{NOTSPACE:captured_response_headers})?"
    ]
    }

  if [type] == "rsyslog" {
    date {
      match => [ "haproxy_timestamp", "dd/MMM/yyyy:HH:mm:ss.SSS" ]
      timezone => "UTC"
    }
  }

  if [type] == "haproxy" {
    date {
      match => [ "haproxy_timestamp", "dd/MMM/yyyy:HH:mm:ss.SSS" ]
    }
  }

    grok {
    match => [
    "full_http_request", "%{WORD:http_verb} %{URIPATHPARAM:http_request}?( HTTP/%{NUMBER:http_version})" ,
    "full_http_request", "<%{WORD:http_request}>"
    ]
    remove_field => [ "full_http_request" ]
    }

}

After fiddling around with that filter I finally figured out what happens. This filter is applied on every entry arriving in Logstash, doesn't matter if it came through the syslog or the Filebeat listener. So every time a syslog message arrived, it was tagged with "_grokparsefailure" because it ran through this filter even though it does not match the line format.

The solution to this is to create a condition before applying the grok filter: It should only apply to logs clearly coming from HAProxy. That's easy for the ones coming from Filebeat, because they're already tagged on the Filebeat daemon (the sender) as "haproxy". But how do I tell Logstash to apply the filter on Syslog entries? This is what I came up with:

root@elk01:~$ cat /etc/logstash/conf.d/13-haproxy.conf
filter {

  if [type] == "syslog" and [programname] == "haproxy" {

    grok {
      match => [
    "message", "%{IP:client_ip}:%{NUMBER:client_port:int} \[%{NOTSPACE:haproxy_timestamp}] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:time_queue:int}/%{NUMBER:time_backend_connect:int}/%{NUMBER:time_duration:int} %{NUMBER:bytes_read:int} %{NOTSPACE:termination_state} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int}" ,
    "message" , "%{IP:client_ip}:%{NUMBER:client_port:int} \[%{NOTSPACE:haproxy_timestamp}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:time_client_req:int}/%{NUMBER:time_queue:int}/%{NUMBER:time_backend_connect:int}/%{NUMBER:time_server_response:int}/%{NUMBER:time_duration:int} %{NUMBER:status_code:int} %{NUMBER:bytes_read:int} %{NOTSPACE:captured_request_cookie} %{NOTSPACE:captured_response_cookie} %{NOTSPACE:termination_state_with_cookie_status} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int}?( \"%{GREEDYDATA:full_http_request}\")?( %{NOTSPACE:captured_response_headers})?"
      ]
    }

    grok {
      match => [
      "full_http_request", "%{WORD:http_verb} %{URIPATHPARAM:http_request}?( HTTP/%{NUMBER:http_version})" ,
      "full_http_request", "<%{WORD:http_request}>"
      ]
      remove_field => [ "full_http_request" ]
    }
    date {
      match => [ "haproxy_timestamp", "dd/MMM/yyyy:HH:mm:ss.SSS" ]
      timezone => "UTC"
    }

  }


  if [type] == "haproxy" {

    grok {
      match => [
    "message", "%{IP:client_ip}:%{NUMBER:client_port:int} \[%{NOTSPACE:haproxy_timestamp}] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:time_queue:int}/%{NUMBER:time_backend_connect:int}/%{NUMBER:time_duration:int} %{NUMBER:bytes_read:int} %{NOTSPACE:termination_state} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int}" ,
    "message" , "%{IP:client_ip}:%{NUMBER:client_port:int} \[%{NOTSPACE:haproxy_timestamp}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:time_client_req:int}/%{NUMBER:time_queue:int}/%{NUMBER:time_backend_connect:int}/%{NUMBER:time_server_response:int}/%{NUMBER:time_duration:int} %{NUMBER:status_code:int} %{NUMBER:bytes_read:int} %{NOTSPACE:captured_request_cookie} %{NOTSPACE:captured_response_cookie} %{NOTSPACE:termination_state_with_cookie_status} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int}?( \"%{GREEDYDATA:full_http_request}\")?( %{NOTSPACE:captured_response_headers})?"
      ]
    }

    grok {
      match => [
      "full_http_request", "%{WORD:http_verb} %{URIPATHPARAM:http_request}?( HTTP/%{NUMBER:http_version})" ,
      "full_http_request", "<%{WORD:http_request}>"
      ]
      remove_field => [ "full_http_request" ]
    }
  }

}

The first if condition handles all entries coming in through the Syslog listener and therefore being tagged as "syslog" type. Because these entries have fields (like procid, programname, facility, etc) these fields can be used to specify a certain type of (application) log. Let's analyze a very simple Syslog entry arriving at Logstash:

{
  "_index": "logstash-2017.09.12",
  "_type": "syslog",
  "_id": "AV50py7qDSxp4hj7KGHi",
  "_version": 1,
  "_score": null,
  "_source": {
    "severity": "info",
    "@timestamp": "2017-09-12T05:54:27.151Z",
    "@version": "1",
    "programname": "sudo",
    "procid": "-",
    "host": "10.10.10.60",
    "sysloghost": "yetanotherlinux",
    "message": " pam_unix(sudo:session): session closed for user root",
    "type": "syslog",
    "facility": "authpriv"
  },
  "fields": {
    "@timestamp": [
      1505195667151
    ]
  },
  "sort": [
    1505195667151
  ]
}

That's a classical Syslog message, split up into the different (known) fields. As you can see, programname is one of them. Therefore this (and the others, too) field can be used to create a very detailed if condition.

The second if condition is explaining itself. These entries are being received by the listener for Filebeat logs. As I mentioned just before, the HAProxy log files are already being tagged as "type => haproxy" on the client side.

Note to self: If you fiddle around with Logstash grok filters, don't forget they're applied on every single entry arriving in Logstash (ergo most likely resulting in a grokparsefailure).

 

Elastic X-Pack error after updating Logstash
Monday - Sep 11th 2017 - by - (0 comments)

After I updated logstash from 1:5.5.1-1 to 1:5.5.2-1 today, a restart of Logstash failed with the following error message:

[2017-09-11T11:01:57,502][FATAL][logstash.runner          ] An unexpected error occurred! {:error=>#<ArgumentError: Setting "xpack.monitoring.elasticsearch.url" hasn't been registered>, :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/settings.rb:32:in `get_setting'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:64:in `set_value'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:83:in `merge'", "org/jruby/RubyHash.java:1342:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:83:in `merge'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:135:in `validate_all'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:244:in `execute'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:209:in `run'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132:in `run'", "/usr/share/logstash/lib/bootstrap/environment.rb:71:in `(root)'"]}

Reason for this is that the X-Pack plugin for Logstash is now outdated and must be installed again:

root@elk01:/usr/share/logstash# bin/logstash-plugin install x-pack
Downloading file: https://artifacts.elastic.co/downloads/logstash-plugins/x-pack/x-pack-5.5.2.zip
Downloading [=============================================================] 100%
Installing file: /tmp/studtmp-18cf6972e1475343a4db37fe32c59de5722bf30f1014098f3ba75869d405/x-pack-5.5.2.zip
Install successful

A restart of Logstash then worked again:

root@elk01:/usr/share/logstash# service logstash restart


 

check_esxi_hardware new with --no-lcd parameter to ignore LCD alerts
Tuesday - Sep 5th 2017 - by - (0 comments)

It's been quiet around the monitoring plugin check_esxi_hardware in the past months. Yet there's still (always) room for improvement and whenever I get the chance to it, I will do it. To everyone having sent a feature request or submitted code (as well as PR's on Github): I haven't forgotten about it. 

Well for starters there's a new version out. Version 20170905 adds a new option (--no-lcd). This option groups together different kinds of CIM elements related to Front Panel/LCD Display's and alerts related to them. Previously some of these elements were hard-coded into the "ignore list", but with that group it allows to disable them altogether.

Thanks to Keith B. Erekson for testing the new version and also to DELL community user "LiborKlepac" for the base idea of adding a "--no-lcd" parameter (see Github commit).

 

Handling different timezones of HAProxy logs in ELK stack
Monday - Aug 28th 2017 - by - (0 comments)

This is a follow-up article on my previous post (Docker container syslog logs not appearing in ELK Kibana (different timezone)). I had the problem that a Docker container, which runs HAProxy, sent its logs in UTC (because the container's timezone was set in UTC). But the receiving Logstash server (running in UTC+2) thought it had to subtract two hours again. Which resulted in a time gap of 4 hours. The log was received at 13:38 UTC+2, the entry was inserted into Elasticseach as 09:38 UTC:

Aug 25 13:38:00 inf-elk01-t logstash[8657]:      "@timestamp" => 2017-08-25T09:38:00.000Z,

To tackle this problem, I had to get myself a bit into the Logstash filters and their possibilities. But first let me outline the setup so the configs make sense at last.

Usually my HAProxy machines run as VM's or inside a Linux container (LXC). HAProxy writes local log files which are then parsed by Filebeat and sent to Logstash (see article Install and configure Elastic Filebeat through Ansible for additional information on that). As you can see on that article, Filebeat appends the "document_type" with the value "haproxy".
In the Docker environment, which runs on Rancher, we use the Rancher-internal load balancer service. As this is based on a container image we cannot interfere with, the only way to get the HAProxy logs from within the container is to modify the HAProxy config. In Rancher this can be done by using the column "custom haproxy.cfg". Entries in there will be merged with the dynamically created configuration. I added the relevant parts for the logging:

Rancher custom haproxy.cfg

global
  log logstash.example.com:514 local0
  log logstash.example.com:514 local1 notice

defaults
  log global
  option httplog
  mode http

HAProxy allows logging to a central syslog server via IP and UDP port. So in this case I configured rsyslog on the Logstash machine (logstash.example.com) to listen to and receive syslog messages.

On the Logstash machine, the central syslog server forwards into Logstash (which listens on port 10514):

root@logstash:~# cat /etc/rsyslog.d/01-json.conf
template(name="json-template"
  type="list") {
    constant(value="{")
      constant(value="\"@timestamp\":\"")     property(name="timereported" dateFormat="rfc3339")
      constant(value="\",\"@version\":\"1")
      constant(value="\",\"message\":\"")     property(name="msg" format="json")
      constant(value="\",\"sysloghost\":\"")  property(name="hostname")
      constant(value="\",\"severity\":\"")    property(name="syslogseverity-text")
      constant(value="\",\"facility\":\"")    property(name="syslogfacility-text")
      constant(value="\",\"programname\":\"") property(name="programname")
      constant(value="\",\"procid\":\"")      property(name="procid")
    constant(value="\"}\n")
}

root@logstash:~# cat /etc/rsyslog.d/99-remote.conf
*.*                       @localhost:10514;json-template

On the Logstash server, the listener port 10514 is defined as a listener (input) and received messages are labeled as type "rsyslog":

# This input block will listen on port 10514 for logs to come in.
# host should be an IP on the Logstash server.
# codec => "json" indicates that we expect the lines we're receiving to be in JSON format
# type => "rsyslog" is an optional identifier to help identify messaging streams in the pipeline.
input {
  udp {
    host => "0.0.0.0"
    port => 10514
    codec => "json"
    type => "rsyslog"
  }
}

Therefore in this ELK stack there are two ways HAProxy logs are being received and written into Elasticseach:

  1. Through Filebeat using the type "haproxy"; used by manually installed HAProxy
  2. Through central syslog using the type "rsyslog"; used by HAProxy installations inside Docker containers, managed by Rancher

As I wrote at the begin, the Docker containers run in UTC (and the image cannot be changed), all other machines run in UTC+2. With that knowledge, the HAProxy config for Logstash can be adjusted (original grok filter for HAProxy found on logz.io):

root@logstash:~# cat /etc/logstash/conf.d/13-haproxy-filter.conf
filter {

    grok {
    match => [
    "message", "%{IP:client_ip}:%{NUMBER:client_port:int} \[%{NOTSPACE:haproxy_timestamp}] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:time_queue:int}/%{NUMBER:time_backend_connect:int}/%{NUMBER:time_duration:int} %{NUMBER:bytes_read:int} %{NOTSPACE:termination_state} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int}" ,
    "message" , "%{IP:client_ip}:%{NUMBER:client_port:int} \[%{NOTSPACE:haproxy_timestamp}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:time_client_req:int}/%{NUMBER:time_queue:int}/%{NUMBER:time_backend_connect:int}/%{NUMBER:time_server_response:int}/%{NUMBER:time_duration:int} %{NUMBER:status_code:int} %{NUMBER:bytes_read:int} %{NOTSPACE:captured_request_cookie} %{NOTSPACE:captured_response_cookie} %{NOTSPACE:termination_state_with_cookie_status} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int}?( \"%{GREEDYDATA:full_http_request}\")?( %{NOTSPACE:captured_response_headers})?"
    ]
    }

  if [type] == "rsyslog" {
    date {
      match => [ "haproxy_timestamp", "dd/MMM/yyyy:HH:mm:ss.SSS" ]
      timezone => "UTC"
    }
  }

  if [type] == "haproxy" {
    date {
      match => [ "haproxy_timestamp", "dd/MMM/yyyy:HH:mm:ss.SSS" ]
    }
  }

    grok {
    match => [
    "full_http_request", "%{WORD:http_verb} %{URIPATHPARAM:http_request}?( HTTP/%{NUMBER:http_version})" ,
    "full_http_request", "<%{WORD:http_request}>"
    ]
    remove_field => [ "full_http_request" ]
    }

}

As you can see I added two if conditions into the config file:

  • If the received document type is "rsyslog" (therefore coming from a Docker container) tell Logstash that the logged time is UTC.
  • If the received document type is "haproxy" (therefore coming through Filebeat) use system (default) timezone.

Also important to note here: The haproxy_timestamp field is being used here as date/time field (not the actual message time).

Since that modification, I have now both HAProxy logs in the same time zone and can compare them directly in graphs.

 

Docker container syslog logs not appearing in ELK Kibana (different timezone)
Friday - Aug 25th 2017 - by - (0 comments)

I was trying to get to the root cause why certain rsyslog lines were not showing up in Kibana.

HAProxy in a Docker container -> log to a central rsyslog server (runs on the ELK machine)
Central rsyslog server -> Sends received messages to local logstash (using json format)

    *.*                       @localhost:10514;json-template

Logstash receives this on UDP 10514 and sends it to output Elasticsearch:

    input {
      udp {
        host => "0.0.0.0"
        port => 10514
        codec => "json"
        type => "rsyslog"
      }
    }

    output {
      if [type] == "rsyslog" {
        elasticsearch {
          hosts => [ "http://localhost:9200" ]
        }
        #stdout { codec => rubydebug }
        stdout { codec => rubydebug { metadata => true } }
      }
    }

As you can see, I have enabled rubydebug to see if logstash really receives the messages (this is the "debug" mode to test logstash). Rsyslog receives the remote HAProxy logs:

Aug 25 11:38:00 192.168.40.15 haproxy[23540]: 10.10.10.11:52596 [25/Aug/2017:11:38:00.017] 80 80_enrico_test_/a1b8fdd53241fe9032c1b73ce55292db0e7fe082 0/0/1/2/3 200 389 - - ---- 11/11/0/0/0 0/0 "GET /_health HTTP/1.1"

And I can see with rubydebug in /var/log/syslog, that this entry went through logstash:

Aug 25 13:38:00 inf-elk01-t logstash[8657]: {
Aug 25 13:38:00 inf-elk01-t logstash[8657]:        "severity" => "info",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:      "@timestamp" => 2017-08-25T09:38:00.000Z,
Aug 25 13:38:00 inf-elk01-t logstash[8657]:        "@version" => "1",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:     "programname" => "haproxy",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:          "procid" => "23540",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:            "host" => "0:0:0:0:0:0:0:1",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:      "sysloghost" => "192.168.40.15",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:         "message" => " 10.10.10.11:52596 [25/Aug/2017:11:38:00.017] 80 80_enrico_test_/a1b8fdd53241fe9032c1b73ce55292db0e7fe082 0/0/1/2/3 200 389 - - ---- 11/11/0/0/0 0/0 \"GET /_health HTTP/1.1\"",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:            "type" => "rsyslog",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:        "facility" => "local0"
Aug 25 13:38:00 inf-elk01-t logstash[8657]: }

In ES I am able to find this entry:

# curl -s -XGET 'http://localhost:9200/logstash-2017.08.25/_search?pretty=true&q='programname':'haproxy'&q='message':'*52596*'&size=1'
    {
      "took" : 24,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "logstash-2017.08.25",
            "_type" : "rsyslog",
            "_id" : "AV4ZLz_7KQDa1cUYcoYw",
            "_score" : 1.0,
            "_source" : {
              "severity" : "info",
              "@timestamp" : "2017-08-25T09:38:00.000Z",
              "@version" : "1",
              "programname" : "haproxy",
              "procid" : "23540",
              "host" : "0:0:0:0:0:0:0:1",
              "sysloghost" : "192.168.40.15",
              "message" : " 10.10.10.11:52596 [25/Aug/2017:11:38:00.017] 80 80_enrico_test_/a1b8fdd53241fe9032c1b73ce55292db0e7fe082 0/0/1/2/3 200 389 - - ---- 11/11/0/0/0 0/0 \"GET /_health HTTP/1.1\"",
              "type" : "rsyslog",
              "facility" : "local0"
            }
          }
        ]
      }
    }

But in Kibana there's nothing:

Docker container rsyslog logs not found in Kibana

By chance I clicked on the time range above and set it to "Today" and wow, there it was:

Kibana Docker Container HAProxy logs

So I had to go much further in the past than expected. But why?
As you can see in the "message", the timestamp coming from the application is set to UTC:

Aug 25 13:38:00 inf-elk01-t logstash[8657]: {
Aug 25 13:38:00 inf-elk01-t logstash[8657]:        "severity" => "info",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:      "@timestamp" => 2017-08-25T09:38:00.000Z,
Aug 25 13:38:00 inf-elk01-t logstash[8657]:        "@version" => "1",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:     "programname" => "haproxy",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:          "procid" => "23540",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:            "host" => "0:0:0:0:0:0:0:1",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:      "sysloghost" => "192.168.40.15",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:         "message" => " 10.10.10.11:52596 [25/Aug/2017:11:38:00.017] 80 80_enrico_test_/a1b8fdd53241fe9032c1b73ce55292db0e7fe082 0/0/1/2/3 200 389 - - ---- 11/11/0/0/0 0/0 \"GET /_health HTTP/1.1\"",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:            "type" => "rsyslog",
Aug 25 13:38:00 inf-elk01-t logstash[8657]:        "facility" => "local0"
Aug 25 13:38:00 inf-elk01-t logstash[8657]: }

Note the local time and the time @timestamp inserted into ES:

Aug 25 13:38:00 inf-elk01-t logstash[8657]:      "@timestamp" => 2017-08-25T09:38:00.000Z,

This is of course completely wrong. The source of this is the the origin server on which HAProxy is running (a Docker container):

root@dockerhost:~# date
Fri Aug 25 14:10:04 CEST 2017

root@dockerhost:~# docker exec -it 44bddc8b9cb0 bash
root@44bddc8b9cb0:/# date 
Fri Aug 25 12:10:06 UTC 2017

This Docker container runs in UTC timezone and is two hours behind the rest of the infrastructure. Ego in Kibana it shows up "in the past".
Now I have to figure out how to tell a Docker container to use the hosts timezone but it's definitely not the fault of the ELK stack. To get some external ideas, I asked how to solve this on the Elastic forums (see https://discuss.elastic.co/t/wrong-time-zone-modification/98372).  

 


Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7422 Days
until Death of Computers
Why?