Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

Rename jpeg files to contain the date the picture was taken
Saturday - May 6th 2017 - by - (0 comments)

I have a ton of pictures on my mobile phone. From time to time I put them on my central backup machine. However I came across a few issues when I wanted to sort them (for example when I wanted to create a family album):

  • The filenames are often just a file counter. For example the camera just counts the picture number up like DSC0001, DSC0002, DSC0003 etc. Somewhere in Android this can (sometimes) be changed in the settings where the file name should contain the date.
  • When I moved pictures from the internal memory to the external (SD card) memory, all picture's file dates were changed to the day the files were moved. 
  • Manually trying to find the original date and then rename the pictures takes hours.

Luckily I came across jhead. An open source tool, written by Matthias Wandel, which can be used on all platforms. For Windows, Linux and Mac OS X there's a pre-compiled executable file which can be launched from the command line (no graphical interface).

jhead is able to read the Exif JPEG headers which still (should) contain the original date the picture was taken. For example:

D:\Downloads\Grafik\JHead>jhead.exe D:\tmp\DSC*
File name    : D:\tmp\DSC_0261.JPG
File size    : 3264489 bytes
File date    : 2016:03:24 12:50:06
Camera make  : Sony
Camera model : D6503
Date/Time    : 2016:02:13 12:17:55
Resolution   : 3840 x 2160
Flash used   : No
Focal length :  4.9mm
Exposure time: 0.0010 s  (1/1000)
Aperture     : f/2.0
ISO equiv.   : 50
Whitebalance : Auto
Metering Mode: pattern
GPS Latitude : ? ?
GPS Longitude: ? ?
JPEG Quality : 97

( Yes, for once I'm on a Windows OS )

As you can see above, the file date and the date/time differ. The "File date" represents the date the picture was moved from the phone's internal memory to the SD card. The "Date/Time" value represents the date and time when the picture was taken.

But jhead is not only able to read the Exif JPEG headers. It comes with a function (-n) to rename the JPEG file using certain variables. In the following example I renamed all files starting with DSC in the folder D:\tmp using the Exif JPEG Date/Time in the format Month-Day-Year-Filename (%m-%d-%Y-%f):

D:\Downloads\Grafik\JHead>jhead.exe -n%m-%d-%Y-%f D:\tmp\DSC*
D:\tmp\DSC_0261.JPG --> D:\tmp\02-13-2016-DSC_0261.jpg
D:\tmp\DSC_0262.JPG --> D:\tmp\02-13-2016-DSC_0262.jpg

Jpeg photos renamed to date picture was taken 

The result is exactly what I needed: The date the picture was taken is now the prefix of the filename, followed by the original filename. Finally I'm able to quickly and properly sort all kinds of pictures - even when they were taken by different cameras and had different filenames.

 

Ubuntu 16.04 reboot and shutdown hanging on LVM metadata daemon
Friday - May 5th 2017 - by - (0 comments)

I've seen this problem already for a couple of months now, basically since we first started to install Ubuntu 16.04 Xenial machines. When a reboot or a shutdown was launched, the system kept hanging. In the VM console the last entry was either on

[ OK ] Stopped LVM2 metadata daemon.

Xenial Reboot hanging

or on

[ *** ] A start job is running for Unattended Upgrades Shutdown (1min 27s / no limit)

Xenial reboot hanging

At the begin I always suspected some problem in the LVM settings. All the affected machines were either clones from another VM or a new VM deployed from a template. In /etc/lvm/backup/VGNAME I saw the name of the template appeared.

But further research shows it is a problem in the unattended-upgrade package. In Ubuntu bug #1654600 it is reported that this shutdown hanging happens when /var is a separate file system and LVM is being used; which is the case on almost all our machines. As of this writing (May 5th 2017) a fix of the unattended-upgrades package is in the pipeline (Fix Committed) but not yet released.

An immediate workaround was found in Ubuntu bug #1661611, which is marked as duplicate of #1654600:

 - fix /usr/share/unattended-upgrades/unattended-upgrades-shutdown to expect "false" instead of "False"

# sed -n "120p" /usr/share/unattended-upgrades/unattended-upgrade-shutdown
    if apt_pkg.config.find_b("Unattended-Upgrade::InstallOnShutdown", False):

# sed -i "120s/False/false/" /usr/share/unattended-upgrades/unattended-upgrade-shutdown

# sed -n "120p" /usr/share/unattended-upgrades/unattended-upgrade-shutdown
    if apt_pkg.config.find_b("Unattended-Upgrade::InstallOnShutdown", false):

But this only "works" because the Python syntax is incorrect now (Python wants "False", not "false"):

So this is not advisable either (although it saves you a 10 minute wait until your machine reboots). Best solution is to wait for the definitive fix of bug #1654600 (or in the meantime uninstall the unattended-upgrade package).

Update May 10th 2017:
I just checked the current status of the unattended-upgrade package and version 0.90ubuntu0.5 (which contains the fix) was released and can now be installed.

 

Monitor a PostgreSQL database in AWS RDS with check_postgres
Thursday - Apr 27th 2017 - by - (0 comments)

Monitoring an AWS RDS database (using PostgreSQL as database engine) is not particularly difficult. The most important points can be checked with check_postgres.pl. This monitoring plugin is already part of the Icinga2 ITL definition and can be used (in a standard setup) using the check "postgres".

An example to monitor the number of connections on a DB:

# check postgres connections
object Service "Postgres Connections" {
  import "service-5m-normal"
  host_name = "aws-rds"
  check_command = "postgres"
  vars.postgres_host = "myfancypgsql.XXXXXXXXXXXX.eu-west-1.rds.amazonaws.com"
  vars.postgres_dbuser = "mydbuser"
  vars.postgres_dbpass = "mydbpass"
  vars.postgres_dbname = "mydbname"
  vars.postgres_action = "backends"
}

This results in the following output:

POSTGRES_BACKENDS OK: DB "mydbname" (host:myfancypgsql.XXXXXXXXXXXX.eu-west-1.rds.amazonaws.com) 4 of 648 connections (1%)

But certain checks require special attention. One of them is the plugin's action "database_size". This action checks all databases running on the target host for their current size and compare the size against thresholds:

# check postgres connections
object Service "Postgres Connections" {
  import "service-5m-normal"
  host_name = "aws-rds"
  check_command = "postgres"
  vars.postgres_host = "myfancypgsql.XXXXXXXXXXXX.eu-west-1.rds.amazonaws.com"
  vars.postgres_dbuser = "mydbuser"
  vars.postgres_dbpass = "mydbpass"
  vars.postgres_dbname = "mydbname"
  vars.postgres_action = "database_size"
  vars.postgres_warning = "500MB"
  vars.postgres_critical = "1000MB"
}

This resulted in the following error message:

Permission denied on database rdsadmin.

That's right. AWS creates a database called "rdsadmin" on each RDS instance. Purpose? Most likely handle privileges coming from the AWS console/UI. But what else is in there? We'll never know because only AWS know the credentials to this database. However this causes a problem with check_postgres.pl.
A way to handle this "hidden database" and exclude it is to only check for database objects belonging to the instance's master user. This can be achieved by using the plugin's option "includeuser". In Icinga 2's service object definition, this looks like:

# check postgres connections
object Service "Postgres Connections" {
  import "service-5m-normal"
  host_name = "aws-rds"
  check_command = "postgres"
  vars.postgres_host = "myfancypgsql.XXXXXXXXXXXX.eu-west-1.rds.amazonaws.com"
  vars.postgres_dbuser = "mydbuser"
  vars.postgres_dbpass = "mydbpass"
  vars.postgres_dbname = "mydbname"
  vars.postgres_action = "database_size"
  vars.postgres_includeuser = "masteruser"
  vars.postgres_warning = "500MB"
  vars.postgres_critical = "1000MB"
}

This tells the plugin to look only for objects (like databases) which belong to "masteruser" (database owner). Result:

POSTGRES_DATABASE_SIZE CRITICAL: DB "mydbname" (host:host:myfancypgsql.XXXXXXXXXXXX.eu-west-1.rds.amazonaws.com) mydbname: 32215345336 (30 GB) postgres: 6744248 (6586 kB) template1: 6744248 (6586 kB)

As you can see, the database "rdsadmin" was excluded. All the checked databases are owned by the master user "masteruser".

Note: Another way would be to define "excludeuser" but at this moment I am not aware of the username being owner of "rdsadmin".

 

Empty page returned from Nginx with fastcgi caching (HEAD vs GET)
Friday - Apr 21st 2017 - by - (0 comments)

A while ago I wrote about Enable caching in Nginx Reverse Proxy (and solve cache MISS only). The article back then was focusing on caching on a reverse proxy setup, meaning it was a proxy_cache. But Nginx also has the possibility to cache FastCGI backends.

For a customer I wanted to enable this kind of caching (fastcgi_cache) for the PHP backend using PHP-FPM. The setup is pretty much the same as proxy_cache - just call it fastcgi_cache instead ;-)

In Nginx's http section:

    ##
    # Caching settings
    ##
    fastcgi_cache_path  /var/www/cache levels=1:2 keys_zone=cachecool:100m max_size=1000m inactive=60m;
    fastcgi_cache_key "$scheme$host$request_uri";

And in the vhost config file within the .php file extension location:

  location ~ \.php$ {
    try_files $uri =404;
    default_type text/html; charset utf-8;
    fastcgi_split_path_info ^(.+\.php)(.*)$;
    fastcgi_pass unix:/run/php5-fpm.sock;
    fastcgi_next_upstream error timeout invalid_header http_500 http_503;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    include fastcgi_params;
    fastcgi_param HTTP_X_REAL_IP $remote_addr;

    # FastCGI Caching
    fastcgi_cache cachecool;
    fastcgi_cache_valid 200 60m;
    fastcgi_cache_methods GET;
    add_header X-Cache $upstream_cache_status;
  }

Then reload Nginx.

So far so good, caching seems to work which I was able to test with curl:

$ curl -I https://app.example.com/phpinfo.php -v
* Hostname was NOT found in DNS cache
*   Trying 1.2.3.4...
* Connected to app.example.com (1.2.3.4) port 443 (#0)
[...ssl stuff...]
> HEAD /phpinfo.php HTTP/1.1
> User-Agent: curl/7.35.0
> Host: app.example.com
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
* Server nginx/1.10.3 is not blacklisted
< Server: nginx/1.10.3
Server: nginx/1.10.3
< Date: Fri, 21 Apr 2017 06:16:56 GMT
Date: Fri, 21 Apr 2017 06:16:56 GMT
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8
< Connection: keep-alive
Connection: keep-alive
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Cache: MISS
X-Cache: MISS

Second curl request:

$ curl -I https://app.example.com/phpinfo.php -v
* Hostname was NOT found in DNS cache
*   Trying 1.2.3.4...
* Connected to app.example.com (1.2.3.4) port 443 (#0)
[...ssl stuff...]
> HEAD /phpinfo.php HTTP/1.1
> User-Agent: curl/7.35.0
> Host: app.example.com
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
* Server nginx/1.10.3 is not blacklisted
< Server: nginx/1.10.3
Server: nginx/1.10.3
< Date: Fri, 21 Apr 2017 06:16:57 GMT
Date: Fri, 21 Apr 2017 06:16:57 GMT
< Content-Type: text/html; charset=UTF-8
Content-Type: text/html; charset=UTF-8
< Connection: keep-alive
Connection: keep-alive
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Cache: HIT
X-Cache: HIT

The X-Cache header confirmed me: I hit the cache. So it's working.

But when I wanted to check the same URL in the browser, I got slapped in the face:

Cached php page empty in browser 

The page is empty?!

I manually removed the Nginx cache...

$ rm -rf /var/www/cache/*

... and then refreshed the browser. phpinfo.php was shown correctly now!
What's the difference between my curl and my browser request? Note the -I parameter after the curl command. This uses the HEAD request method. HEAD only returns the HTTP headers, no body. Accessing a website in the browser (usually) uses the GET method. So here we have the difference and also the explanation why the body was empty, resulting in an empty page.

So how do I solve this? The solution (found on ServerFault.com) is actually pretty self-explanatory: The caching key also needs to contain the request method. Remember the setup in Nginx's http section?

    fastcgi_cache_key "$scheme$host$request_uri";

 Here I only set the $scheme (https), the $host (app.example.com) and the $request_uri (/phpinfo.php). The caching mechanism doesn't differ between a GET, a HEAD or even another request method like POST. So let's add this:

    fastcgi_cache_key "$scheme$request_method$host$request_uri";

After a Nginx reload, I tried to reproduce the empty cache problem:

  • Removed the cache (rm -rf /var/www/cache/*)
  • Launched the curl command with HEAD request method (curl -I) from above - X-Cache was a MISS
  • Launched the same curl command again - X-Cache was a HIT
  • Opened the same URL in browser: phpinfo was shown correctly

Solved!

 

Nginx compile fails with fatal error: ld terminated with signal 9
Friday - Apr 21st 2017 - by - (0 comments)

If you ever experience a failed nginx compile process, which ends with the following error:

collect2: fatal error: ld terminated with signal 9 [Killed]
compilation terminated.
objs/Makefile:355: recipe for target 'objs/nginx' failed
make[2]: *** [objs/nginx] Error 1
make[2]: Leaving directory '/home/builder/build/nginx-1.12.0'
Makefile:8: recipe for target 'build' failed
make[1]: *** [build] Error 2
make[1]: Leaving directory '/home/builder/build/nginx-1.12.0'

Check out your memory usage. Dmesg is your friend:

[237469.710196] Out of memory: Kill process 685 (ld) score 716 or sacrifice child
[237469.712587] Killed process 685 (ld) total-vm:374356kB, anon-rss:357624kB, file-rss:0kB

After increasing the memory capacity of this VM, the nginx compile worked just fine.

 

Firmware-Upgrade of a Dahua IPC-HFW1320S network camera
Tuesday - Apr 18th 2017 - by - (0 comments)

Over a year ago, in March 2016, I bought a Dahua IPC-HFW1320S ip/network camera. I installed the camera outside at the door entry and it has been running 24/7 since May 2016. 

So far so good - but I always wondered why there are newer any new firmware versions available. I also have two other network cameras from D-Link which got more or less frequent firmware updates.

Today I checked again and hey! There's indeed a new firmware version available for download:

Dahua IPC-HFW1320S new firmware version 

I downloaded the PAL version (I'm in Switzerland) and unzipped the file - which at the end is the following file: DH_IPC-HX1X2X-Themis_Eng_P_V2.600.0005.0.R.20161219.bin.
Note: NTSC version is for USA, Japan and some countries in South America. Check out https://upload.wikimedia.org/wikipedia/commons/0/0d/PAL-NTSC-SECAM.svg if you're unsure.

I decided to give it a shot and use ConfigTool as software for the firmware upgrade. Once opened it quickly detected the already running Dahua camera and also showed the current firmware version (2.400.x):

Dahua ConfigTool 

After a click on the operate icon, a new window is opened for this camera. In the tab "Upgrade" a firmware upgrade can be launched. I selected the previously unzipped file and started the upgrade by clicking on the  "Upgrade" button.

Dahua ConfigTool Firmware Upgrade 

The firmware file was first transferred, then the upgrading process started:

Dahua Firmware Upgrade

 Dahua Firmware Upgrade Process Dahua Firmware Upgrade Process Dahua Firmware Upgrade Process

At the end a message "Upgrade successfully, device need to reboot" appeared. During this phase the camera indeed booted itself. The window can now be closed with the "Return" button at the bottom right.

Back in the ConfigTool I launched a refresh and the camera now showed the new firmware version (2.600.x).

Dahua Firmware Upgraded

 

Unable to delete application link in JIRA (NullPointerException: consumer)
Monday - Apr 3rd 2017 - by - (0 comments)

In a previous article (Tackling Confluence migration issues (Windows+MSSQL to Linux+MariaDB)) I wrote about some migration issues of an Atlassian Confluence installation. At the end of this mentioned migration a new test Confluence (a staging installation) was ready to be used.

Now that the test instance, let's call it confluence-test.example.com, was ready, we tried to create an application link between confluence-test.example.com and the production JIRA instance, jira.example.com. The application link was successfully created but then we wanted to clean it to have a clean test/staging setup between a confluence-test and a jira-test instance. The application link confluence-test and jira-prod was removed and then problems started. The production Confluence was not able to retrieve the issues information from JIRA anymore.
I realized that by creating confluence-test using the backup/restore function the original Confluence server.id from confluence-prod was used. When the application link confluence-test and jira-prod was created, it used the same confluence:id in JIRA's database. When this application link was deleted, it also deleted the OAuth exchange information of confluence-prod.

To solve this, the application link between confluence-prod and jira-prod should be recreated but the application link could not be deleted in jira-prod. The following error message appeared in the atlassian-jira log file:

2017-03-30 21:18:47,589 http-bio-8080-exec-24 ERROR sysadmin 1278x180x1 1h0gr1r 192.168.1.150 /rest/applinks/2.0/permission/reciprocate-application-delete/c7726e78-6f42-3f2b-9253-653cb5e44bf1 [common.error.jersey.ThrowableExceptionMapper] Uncaught exception thrown by REST service: java.lang.NullPointerException: consumer
com.atlassian.cache.CacheException: java.lang.NullPointerException: consumer
    at com.atlassian.cache.memory.DelegatingCache$DelegatingLoadingCache.get(DelegatingCache.java:322)
    at com.atlassian.jira.oauth.consumer.CachingConsumerTokenStore.get(CachingConsumerTokenStore.java:76)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
    at org.springframework.osgi.service.importer.support.internal.aop.ServiceInvoker.doInvoke(ServiceInvoker.java:58)
    at org.springframework.osgi.service.importer.support.internal.aop.ServiceInvoker.invoke(ServiceInvoker.java:62)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
    at org.springframework.osgi.service.util.internal.aop.ServiceTCCLInterceptor.invokeUnprivileged(ServiceTCCLInterceptor.java:56)
    at org.springframework.osgi.service.util.internal.aop.ServiceTCCLInterceptor.invoke(ServiceTCCLInterceptor.java:39)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
    at org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke(LocalBundleContextAdvice.java:59)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
    at com.sun.proxy.$Proxy1738.get(Unknown Source)
    at com.atlassian.applinks.core.auth.oauth.DefaultConsumerTokenStoreService.getConsumerToken(DefaultConsumerTokenStoreService.java:102)
    at com.atlassian.applinks.core.auth.oauth.ThreeLeggedOAuthRequestFactoryImpl.retrieveConsumerToken(ThreeLeggedOAuthRequestFactoryImpl.java:89)
    at com.atlassian.applinks.core.auth.oauth.ThreeLeggedOAuthRequestFactoryImpl.createRequest(ThreeLeggedOAuthRequestFactoryImpl.java:84)
    at com.atlassian.applinks.core.auth.ApplicationLinkRequestFactoryFactoryImpl$AbsoluteURLRequestFactory.createRequest(ApplicationLinkRequestFactoryFactoryImpl.java:201)
    at com.atlassian.applinks.core.rest.PermissionResource.checkPermissionFor(PermissionResource.java:241)
    at com.atlassian.applinks.core.rest.PermissionResource.canDeleteReciprocalApplicationLink(PermissionResource.java:93)  <+3>
    at java.lang.reflect.Method.invoke(Unknown Source)  <+19> (DispatchProviderHelper.java:234) (DispatchProviderHelper.java:100) (DefaultMethodInvocation.java:61) (ExpandInterceptor.java:38) (DefaultMethodInvocation.java:61) (DispatchProviderHelper.java:132) (DispatchProviderHelper.java:230) (ResourceJavaMethodDispatcher.java:75) (HttpMethodRule.java:288) (RightHandPathRule.java:147) (ResourceClassRule.java:108) (RightHandPathRule.java:147) (RootResourceClassesRule.java:84) (WebApplicationImpl.java:1469) (WebApplicationImpl.java:1400) (WebApplicationImpl.java:1349) (WebApplicationImpl.java:1339) (WebComponent.java:416) (ServletContainer.java:537)
    at com.atlassian.plugins.rest.module.RestDelegatingServletFilter$JerseyOsgiServletContainer.doFilter(RestDelegatingServletFilter.java:178)  <+1> (ServletContainer.java:795)
    at com.atlassian.plugins.rest.module.RestDelegatingServletFilter.doFilter(RestDelegatingServletFilter.java:73)  <+16> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (ServletFilterModuleContainerFilter.java:77) (ServletFilterModuleContainerFilter.java:63) (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70) (RestServletUtilsUpdaterFilter.java:26) (RestServletUtilsUpdaterFilter.java:40) (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70) (ContextFilter.java:25) (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.servicedesk.internal.web.RedirectExternalCustomerToPortalFilter.doFilter(RedirectExternalCustomerToPortalFilter.scala:18)  <+4> (AbstractHttpFilter.java:31) (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.mywork.client.filter.ServingRequestsFilter.doFilter(ServingRequestsFilter.java:37)  <+3> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.plugins.cors.CorsFilter.doFilter(CorsFilter.java:65)  <+3> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.prettyurls.filter.PrettyUrlsSiteMeshFixupFilter.doFilter(PrettyUrlsSiteMeshFixupFilter.java:36)  <+3> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.prettyurls.filter.PrettyUrlsDispatcherFilter.doFilter(PrettyUrlsDispatcherFilter.java:60)  <+3> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.prettyurls.filter.PrettyUrlsSiteMeshFilter.doFilter(PrettyUrlsSiteMeshFilter.java:92)  <+3> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.prettyurls.filter.PrettyUrlsMatcherFilter.doFilter(PrettyUrlsMatcherFilter.java:56)  <+3> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.labs.botkiller.BotKillerFilter.doFilter(BotKillerFilter.java:36)  <+21> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (ServletFilterModuleContainerFilter.java:77) (ServletFilterModuleContainerFilter.java:63) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (AccessLogFilter.java:107) (AccessLogFilter.java:91) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (XsrfTokenAdditionRequestFilter.java:54) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (PageBuilderFilter.java:90) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (CommittedResponseHtmlErrorRecoveryFilter.java:63) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (IteratingFilterChain.java:46) (DelegatingPluginFilter.java:70)
    at com.atlassian.prettyurls.filter.PrettyUrlsCombinedMatchDispatcherFilter.doFilter(PrettyUrlsCombinedMatchDispatcherFilter.java:61)  <+22> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (ServletFilterModuleContainerFilter.java:77) (ServletFilterModuleContainerFilter.java:63) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (SecurityFilter.java:239) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (TrustedApplicationsFilter.java:100) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (BaseLoginFilter.java:172) (JiraLoginFilter.java:70) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (IteratingFilterChain.java:46) (DelegatingPluginFilter.java:70) (OAuthFilter.java:79) (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.plugins.rest.module.servlet.RestSeraphFilter.doFilter(RestSeraphFilter.java:40)  <+3> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.prettyurls.filter.PrettyUrlsCombinedMatchDispatcherFilter.doFilter(PrettyUrlsCombinedMatchDispatcherFilter.java:61)  <+9> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (ServletFilterModuleContainerFilter.java:77) (ServletFilterModuleContainerFilter.java:63) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (AbstractJohnsonFilter.java:71) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208)
    at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:176)
    at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:145)
    at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:92)  <+10> (UrlRewriteFilter.java:394) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (GzipFilter.java:88) (GzipFilter.java:59) (JiraGzipFilter.java:56) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (IteratingFilterChain.java:46) (DelegatingPluginFilter.java:70)
    at com.atlassian.analytics.client.filter.JiraAnalyticsFilter.doFilter(JiraAnalyticsFilter.java:40)
    at com.atlassian.analytics.client.filter.AbstractHttpFilter.doFilter(AbstractHttpFilter.java:32)  <+3> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (DelegatingPluginFilter.java:70)
    at com.atlassian.prettyurls.filter.PrettyUrlsCombinedMatchDispatcherFilter.doFilter(PrettyUrlsCombinedMatchDispatcherFilter.java:61)  <+40> (DelegatingPluginFilter.java:78) (IteratingFilterChain.java:42) (ServletFilterModuleContainerFilter.java:77) (ServletFilterModuleContainerFilter.java:63) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (ChainedFilterStepRunner.java:87) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (AbstractCachingFilter.java:33) (AbstractHttpFilter.java:31) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (AbstractEncodingFilter.java:41) (AbstractHttpFilter.java:31) (PathMatchingEncodingFilter.java:45) (AbstractHttpFilter.java:31) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (JiraStartupChecklistFilter.java:79) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (MultipartBoundaryCheckFilter.java:41) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (ChainedFilterStepRunner.java:87) (JiraFirstFilter.java:61) (ApplicationFilterChain.java:241) (ApplicationFilterChain.java:208) (StandardWrapperValve.java:220) (StandardContextValve.java:122) (AuthenticatorBase.java:501) (StandardHostValve.java:171) (ErrorReportValve.java:103) (StandardEngineValve.java:116) (AccessLogValve.java:950) (CoyoteAdapter.java:408) (AbstractHttp11Processor.java:1070) (AbstractProtocol.java:611) (JIoEndpoint.java:314)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException: consumer
    at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
    at com.atlassian.oauth.Token$TokenBuilder.consumer(Token.java:179)
    at com.atlassian.jira.oauth.consumer.OfBizConsumerTokenStore.createTokenFromGV(OfBizConsumerTokenStore.java:169)
    at com.atlassian.jira.oauth.consumer.OfBizConsumerTokenStore.get(OfBizConsumerTokenStore.java:73)
    at com.atlassian.jira.oauth.consumer.CachingConsumerTokenStore$ConsumerTokenCacheLoader.load(CachingConsumerTokenStore.java:129)
    at com.atlassian.cache.memory.MemoryCacheManager$3$1.load(MemoryCacheManager.java:137)
    at com.atlassian.cache.memory.BlockingCacheLoader.load(BlockingCacheLoader.java:51)
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3573)
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3970)
    at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4834)
    at com.atlassian.cache.memory.DelegatingCache$DelegatingLoadingCache.get(DelegatingCache.java:316)
    ... 221 more

It turned out that the oauth consumer key, needed for the application link between jira-prod and confluence-prod, was deleted when the application link jira-prod to confluence-test was deleted (because of the same confluence:id).

Now that the application link jira-prod to confluence-prod could not be deleted in the user interface, this needs to be done in the database directly. Luckily there's an Atlassian KB article for this. So here we go!

First action was to stop JIRA. The database queries were done by using Microsoft SQL Server Management Studio on the JIRA database (a MSSQL db). In addtion to the SQL statements from the KB article above, I also ran a SELECT just before deleting anything, just to make sure and verify what I'm about to delete.

SELECT substring(a.property_key,16,36) as "Application Key", b.propertyvalue as "Application Name" FROM propertyentry a join propertystring b on a.id=b.id where a.property_key like 'applinks.admin%name';
Application Key                            Application Name
bc4856d3-a7d7-3da1-ab24-76deb75bbd3a    External App
c7726e78-6f42-3f2b-9253-653cb5e44bf1    Confluence
 
 
SELECT * FROM oauthspconsumer WHERE consumer_key IN (SELECT CONVERT(NVARCHAR(MAX), propertyvalue) FROM propertystring WHERE id IN (SELECT id FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%'));
ID    CREATED    CONSUMER_KEY    consumername    PUBLIC_KEY    DESCRIPTION    CALLBACK    TWO_L_O_ALLOWED    EXECUTING_TWO_L_O_USER    TWO_L_O_IMPERSONATION_ALLOWED    THREE_L_O_ALLOWED
12003    2017-03-30 13:09:57.430    Confluence:4117472381    Confluence    MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCCNfUDzSXbadyMPoUEJrsFnpNMxFVrjFcl1gKaibSoTe/mNk2Si6r/Ui30cphAPaxWwwmZpnOuQoeCoMlKwm/VAo5b+6BCdb0rYBxU4FMEu6qvJAdmofrF4rhY5H82G4kdcf+8Gd7wl0HvikFgvAs4IVQcnbvcyLKC32hrrSrvpwIDAQAB    Atlassian Confluence http://localhost:8080    NULL    Y    NULL    N    Y
 
 
SELECT * FROM oauthconsumer;
ID    CREATED    consumername    CONSUMER_KEY    consumerservice    PUBLIC_KEY    PRIVATE_KEY    DESCRIPTION    CALLBACK    SIGNATURE_METHOD    SHARED_SECRET
10100    2015-05-07 17:09:05.047    JIRA    jira:f8a757db-6e19-41e2-887c-43c79dbbeab3    __HOST_SERVICE__    MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCc+L9ULOP6Q6tIgtmAAb+9wgwkabGhWb365aCLfoup0Yx3FdF3GvUns1+NS7MKwCxT+FgMKeNQWrsuxE9XQvMnNk1lz7t628N1S+GSIj8q6BQlZ7Cj3pbGhwTaSPTxVqbLkUh4ny1egZDoO4Ef6s8d7RG4VBQCe0Q9PMtFy2Eb3wIDAQAB    MIICdwIBADANBgkqhkiG9w0BAQEFAASCAmEwggJdAgEAAoGBAJz4v1Qs4/pDq0iC2YABv73CDCRpsaFZvfrloIt+i6nRjHcV0Xca9SezX41LswrALFP4WAwp41Bauy7ET1dC8yc2TWXPu3rbw3VL4ZIiPyroFCVnsKPelsaHBNpI9PFWpsuRSHifLV6BkOg7gR/qzx3tEbhUFAJ7RD08y0XLYRvfAgMBAAECgYBopj2INNVYh7ytfu7aNFBMhg3qSLyvwVCy8Fv0I82gzZ9D1TvsmABRE3jkjk8zH6Nt5+drVJHDhHm+WAo9Ntk7fpXFE2si/LPuSW8Be4WO7DkCTJ8Z9dnKBxQbSCNl3UBlVbAwu65DQ1E81hgRfl1yMuiwZMHNpVVHS3T3cajyoQJBAPSL54JT3u+/M+eZ1lWVlpLdul1d6fPkzZbk3q03k/6wj/qSMBWEcmRLq3hj6/FTd9/lL6tXesXzuu225AXQirsCQQCkUtOs9Is2Y0wqJj+v1z5PNpFBlDGIVR3YiPc/aBEBlYR3BD5L7ukm7Gk5xOtIZp9esZCqgTiROmeKL4c1VRstAkBfk6aB62HAdgxFQnzgqjxVXk+OWjwr1Ir9pKnbLVOsQTmkhyo/3qoOxPxZa0Q155d8bEi7QM99OoseiVKJaz4tAkEAnikWkkcB40Rp37vW2sG6uHWWE9+2WTlLs8oI6F+a1sJ5sDBGYcG/aXZsxWyy5YnY5UTVMDeWIbpsRAYBBqU2tQJBAIUsX9pfzPF4CCA6QrzGQ/efPlds4m8yNgbRljE/iVg8hTDomaG9JJRnVsrKZJYNb/YkZSFZp1h+DIwqBxNEhe0=    Atlassian JIRA at https://jira.example.com    NULL    RSA_SHA1    NULL
 
 
SELECT * FROM oauthconsumertoken WHERE token_key LIKE '%c7726e78-6f42-3f2b-9253-653cb5e44bf1%';
ID    CREATED    TOKEN_KEY    TOKEN    TOKEN_SECRET    TOKEN_TYPE    CONSUMER_KEY
10100    2013-03-18 07:07:06.797    c7726e78-6f42-3f2b-9253-653cb5e44bf1:XXXXXXXXXXXXXXX    D3WbhepfmVPjWY65w4NDYVge2LKuo7jb    09eaV2CTK6I4byMm5plyZ9Eh6Geuuvov    ACCESS    XXXXXXXXXXXXXXX
10102    2013-03-18 14:13:00.450    c7726e78-6f42-3f2b-9253-653cb5e44bf1:XXXXXXXXXXXXXXX   YGU4Hl4QTgMxLMMSjPlcHigTqHicdE3A    mlmrRRtE6rp5gkX9FsjGlqekxidcUkn0    ACCESS    XXXXXXXXXXXXXXX
10103    2013-03-18 18:08:29.593    c7726e78-6f42-3f2b-9253-653cb5e44bf1:XXXXXXXXXXXXXXX    dxZ9nv2kXUSAF3V6rfllnDbTsmusknd9    GnmKAKUOjXBXkbqMipD54ZEzV3gslbO3    ACCESS    XXXXXXXXXXXXXXX
[...]
 
 
DELETE FROM oauthspconsumer WHERE consumer_key IN (SELECT CONVERT(NVARCHAR(MAX), propertyvalue) FROM propertystring WHERE id IN (SELECT id FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%'));
(1 row(s) affected)
 
 
SELECT * FROM oauthspconsumer WHERE consumer_key IN (SELECT CONVERT(NVARCHAR(MAX), propertyvalue) FROM propertystring WHERE id IN (SELECT id FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%'));
no results
 
 
DELETE FROM oauthconsumer;
(1 row(s) affected)
 
 
SELECT * FROM oauthconsumer;
no results
 
 
DELETE FROM oauthconsumertoken WHERE token_key LIKE '%c7726e78-6f42-3f2b-9253-653cb5e44bf1%';
(97 row(s) affected)
 
 
SELECT * FROM oauthconsumertoken WHERE token_key LIKE '%c7726e78-6f42-3f2b-9253-653cb5e44bf1%';
no results
 
 
SELECT * FROM trustedapp WHERE application_id IN (SELECT CONVERT(NVARCHAR(MAX), propertyvalue) FROM propertystring WHERE id IN (SELECT id FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%'));
no results
 
 
DELETE FROM trustedapp WHERE application_id IN (SELECT CONVERT(NVARCHAR(MAX), propertyvalue) FROM propertystring WHERE id IN (SELECT id FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%'));
(0 row(s) affected)
 
 
SELECT * FROM propertystring WHERE id IN (SELECT id FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%');
ID            propertyvalue
490595        #java.util.Properties #Thu Mar 30 13:10:25 CEST 2017
490590        #java.util.Properties #Thu Mar 30 13:10:24 CEST 2017
490619        https://confluence.example.com
490618        Confluence
490587        true
490589        #java.util.List oauth.incoming.consumerkey
490596        #java.util.List com.atlassian.applinks.api.auth.types.OAuthAuthenticationProvider com.atlassian.applinks.api.auth.types.TwoLeggedOAuthAuthenticationProvider
490620        http://confluence.example.com:8091
490582        confluence
490588        Confluence:4117472381
 
 
DELETE FROM propertystring WHERE id IN (SELECT id FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%');
(10 row(s) affected)
 
 
SELECT * FROM propertystring WHERE id IN (SELECT id FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%');
no results
 
 
SELECT * FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%';
ID        ENTITY_NAME        ENTITY_ID    PROPERTY_KEY                                                                                propertytype
490595    jira.properties    1            applinks.admin.c7726e78-6f42-3f2b-9253-653cb5e44bf1.auth.4e751871359fc86ed973d2a58b40c2b7    5
490590    jira.properties    1            applinks.admin.c7726e78-6f42-3f2b-9253-653cb5e44bf1.auth.52e4d6463f02700eeefbefc74233de4f    5
490619    jira.properties    1            applinks.admin.c7726e78-6f42-3f2b-9253-653cb5e44bf1.display.url                                5
490618    jira.properties    1            applinks.admin.c7726e78-6f42-3f2b-9253-653cb5e44bf1.name                                    5
490587    jira.properties    1            applinks.admin.c7726e78-6f42-3f2b-9253-653cb5e44bf1.primary                                    5
490589    jira.properties    1            applinks.admin.c7726e78-6f42-3f2b-9253-653cb5e44bf1.propertyKeys                            5
490596    jira.properties    1            applinks.admin.c7726e78-6f42-3f2b-9253-653cb5e44bf1.providerKeys                            5
490620    jira.properties    1            applinks.admin.c7726e78-6f42-3f2b-9253-653cb5e44bf1.rpc.url                                    5
490582    jira.properties    1            applinks.admin.c7726e78-6f42-3f2b-9253-653cb5e44bf1.type                                    5
490588    jira.properties    1            applinks.application.c7726e78-6f42-3f2b-9253-653cb5e44bf1.oauth.incoming.consumerkey        5
 
 
DELETE FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%';
(10 row(s) affected)
 
 
SELECT * FROM propertyentry WHERE property_key LIKE 'applinks.%c7726e78-6f42-3f2b-9253-653cb5e44bf1%';
no results
 
 
SELECT * FROM propertystring where id in (select id from propertyentry where property_key like 'applinks.global%');
ID        propertyvalue
490586    #java.util.List bc4856d3-a7d7-3da1-ab24-76deb75bbd3a c7726e78-6f42-3f2b-9253-653cb5e44bf1
 
 
UPDATE propertystring SET propertyvalue = CAST(REPLACE(CAST(propertyvalue as NVarchar(MAX)), CHAR(10)+'c7726e78-6f42-3f2b-9253-653cb5e44bf1','') AS NText) where id in (select id from propertyentry where property_key like 'applinks.global%');
(1 row(s) affected)
 
 
SELECT * FROM propertystring where id in (select id from propertyentry where property_key like 'applinks.global%');
ID        propertyvalue
490586    #java.util.List bc4856d3-a7d7-3da1-ab24-76deb75bbd3a

After these steps, JIRA was started again. The application link to confluence-prod was now gone and it could be recreated in the JIRA admin interface.

Thanks to Sharon for her in-depth JIRA knowledge and Gojko from Atlassian support.

 

Galera Arbitrator garbd not starting: Exception in creating receive loop
Wednesday - Mar 29th 2017 - by - (0 comments)

For a new with a HA database I decided to create a Galera cluster, as I already installed a couple of Galera clusters so far (see MySQL Galera cluster not starting (failed to open channel). But this time I decided to create a two node cluster with an Arbitrator service for split-brain situations.

The Galera Arbitrator service is a daemon process (garbd) which simply connects to the Galera cluster and is from then on part of the cluster. However there are no databases synced on the disk - it's a pure member, not a data node. That works great for this scenario because we have a dual data center anyway and I don't need three times the same data in two data centers.

I created a config filefor garbd, according to the Galera Arbitrator documentation:

root@garb:~# cat /etc/garbd.conf
# arbtirator.config
group = MYCLUSTER
address = gcomm://10.161.206.45,10.161.206.46

But when I tried to start garbd, it failed:

root@garb:~# garbd --cfg /etc/garbd.conf
2017-03-29 15:47:01.480  INFO: CRC-32C: using hardware acceleration.
2017-03-29 15:47:01.480  INFO: Read config:
    daemon:  0
    name:    garb
    address: gcomm://10.161.206.45,10.161.206.46
    group:   ATLDB
    sst:     trivial
    donor:  
    options: gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
    cfg:     /etc/garbd.conf
    log:  

I came across a github issue which stated that ports are required:

"garbd" consistently failed to start unless the configuration [...] explicitly provided the port number.

Important here is to note that we're talking about Galera ports, not MySQL/MariaDB ports (3306).
The default Galera port is 4567 and can be verified on one of the Galera data nodes:

root@galera-node1:~# netstat -lntup | grep mysql
tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      2971/mysqld    
tcp        0      0 0.0.0.0:4567            0.0.0.0:*               LISTEN      2971/mysqld 

Using the port 4567, I adapted /etc/garbd.conf:

root@garb:~#  cat /etc/garbd.conf
# arbtirator.config
group = ATLDB
address = gcomm://10.161.206.45:4567,10.161.206.46:4567

Start test:

root@garb:~# garbd --cfg /etc/garbd.conf
2017-03-29 15:48:31.289  INFO: CRC-32C: using hardware acceleration.
2017-03-29 15:48:31.289  INFO: Read config:
    daemon:  0
    name:    garb
    address: gcomm://10.161.206.45:4567,10.161.206.46:4567
    group:   ATLDB
    sst:     trivial
    donor:  
    options: gcs.fc_limit=9999999; gcs.fc_factor=1.0; gcs.fc_master_slave=yes
    cfg:     /etc/garbd.conf
    log:    

2017-03-29 15:48:31.290  INFO: protonet asio version 0
2017-03-29 15:48:31.290  INFO: Using CRC-32C for message checksums.
2017-03-29 15:48:31.290  INFO: backend: asio
2017-03-29 15:48:31.290  INFO: gcomm thread scheduling priority set to other:0
2017-03-29 15:48:31.290  WARN: access file(./gvwstate.dat) failed(No such file or directory)
2017-03-29 15:48:31.290  INFO: restore pc from disk failed
2017-03-29 15:48:31.291  INFO: GMCast version 0
2017-03-29 15:48:31.291  INFO: (6520b85a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-03-29 15:48:31.291  INFO: (6520b85a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-03-29 15:48:31.291  INFO: EVS version 0
2017-03-29 15:48:31.291  INFO: gcomm: connecting to group 'ATLDB', peer '10.161.206.45:4567,10.161.206.46:4567'
2017-03-29 15:48:31.293  INFO: (6520b85a, 'tcp://0.0.0.0:4567') connection established to 6a1ea4ef tcp://10.161.206.45:4567
2017-03-29 15:48:31.293  INFO: (6520b85a, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2017-03-29 15:48:31.296  INFO: (6520b85a, 'tcp://0.0.0.0:4567') connection established to 5d311f46 tcp://10.161.206.46:4567
2017-03-29 15:48:31.585  INFO: declaring 5d311f46 at tcp://10.161.206.46:4567 stable
2017-03-29 15:48:31.585  INFO: declaring 6a1ea4ef at tcp://10.161.206.45:4567 stable
2017-03-29 15:48:31.586  INFO: Node 5d311f46 state prim
2017-03-29 15:48:31.587  INFO: view(view_id(PRIM,5d311f46,5) memb {
    5d311f46,0
    6520b85a,0
    6a1ea4ef,0
} joined {
} left {
} partitioned {
})
2017-03-29 15:48:31.587  INFO: save pc into disk
2017-03-29 15:48:31.792  INFO: gcomm: connected
2017-03-29 15:48:31.792  INFO: Changing maximum packet size to 64500, resulting msg size: 32636
2017-03-29 15:48:31.792  INFO: Shifting CLOSED -> OPEN (TO: 0)
2017-03-29 15:48:31.792  INFO: Opened channel 'ATLDB'
2017-03-29 15:48:31.792  INFO: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2017-03-29 15:48:31.792  INFO: STATE EXCHANGE: Waiting for state UUID.
2017-03-29 15:48:31.792  INFO: STATE EXCHANGE: sent state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23
2017-03-29 15:48:31.792  INFO: STATE EXCHANGE: got state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23 from 0 (inf-atldb02-p)
2017-03-29 15:48:31.792  INFO: STATE EXCHANGE: got state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23 from 2 (inf-atldb01-p)
2017-03-29 15:48:31.793  INFO: STATE EXCHANGE: got state msg: 64f868a2-1486-11e7-b76e-b64f3ece7e23 from 1 (garb)
2017-03-29 15:48:31.793  INFO: Quorum results:
    version    = 4,
    component  = PRIMARY,
    conf_id    = 4,
    members    = 2/3 (joined/total),
    act_id     = 0,
    last_appl. = -1,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 6a1f102a-13a3-11e7-b710-b2876418a643
2017-03-29 15:48:31.793  INFO: Flow-control interval: [9999999, 9999999]
2017-03-29 15:48:31.793  INFO: Shifting OPEN -> PRIMARY (TO: 0)
2017-03-29 15:48:31.793  INFO: Sending state transfer request: 'trivial', size: 7
2017-03-29 15:48:31.795  INFO: Member 1.0 (garb) requested state transfer from '*any*'. Selected 0.0 (inf-atldb02-p)(SYNCED) as donor.
2017-03-29 15:48:31.795  INFO: Shifting PRIMARY -> JOINER (TO: 0)
2017-03-29 15:48:31.796  INFO: 0.0 (inf-atldb02-p): State transfer to 1.0 (garb) complete.
2017-03-29 15:48:31.796  INFO: 1.0 (garb): State transfer from 0.0 (inf-atldb02-p) complete.
2017-03-29 15:48:31.796  INFO: Shifting JOINER -> JOINED (TO: 0)
2017-03-29 15:48:31.797  INFO: Member 0.0 (inf-atldb02-p) synced with group.
2017-03-29 15:48:31.797  INFO: Member 1.0 (garb) synced with group.
2017-03-29 15:48:31.797  INFO: Shifting JOINED -> SYNCED (TO: 0)

It does indeed look better now! A verification on data node 1 confirmed that the cluster size increased from 2 to 3:

root@galera-node1:~#  mysql -B -e "SHOW STATUS WHERE variable_name ='wsrep_local_state_comment' \
OR variable_name ='wsrep_cluster_size' \
OR variable_name ='wsrep_incoming_addresses' \
OR variable_name ='wsrep_cluster_status' \
OR variable_name ='wsrep_connected' \
OR variable_name ='wsrep_ready' \
OR variable_name ='wsrep_local_state_uuid' \
OR variable_name ='wsrep_cluster_state_uuid';"

Variable_name    Value
wsrep_cluster_size    3
wsrep_cluster_state_uuid    6a1f102a-13a3-11e7-b710-b2876418a643
wsrep_cluster_status    Primary
wsrep_connected    ON
wsrep_incoming_addresses    ,10.161.206.46:3306,10.161.206.45:3306
wsrep_local_state_comment    Synced
wsrep_local_state_uuid    6a1f102a-13a3-11e7-b710-b2876418a643
wsrep_ready    ON

Note that the garbd machine doesn't show up in the row "wsrep_incoming_addresses". It's merely showing up "empty" (note the comma). That makes sense, because there is no MySQL running on the Arbitrator Service machine, ergo no 3306 listener.

 

Tackling Confluence migration issues (Windows+MSSQL to Linux+MariaDB)
Friday - Mar 24th 2017 - by - (0 comments)

One of my latest tasks is a migration of Atlassian Confluence 5.7 from a pure Windows environment to a Linux environment.

Status Quo: Confluence application (Tomcat) runs on a Windows Server 2012. It's database runs on a Microsoft SQL Server 2008 R2.

Target installation: Confluence application (Tomcat) runs on an Ubuntu 16.04 machine (proably a LXC container). Database runs on a MariaDB 10.x Galera Cluster. For a test environment only a single MariaDB running localhost on the same server as the application was chosen.

But already during the first test migration I experienced a couple of issues. After one issue was solved, the next one appeared etc. The following issues were all successfully tackled.

 

Confluence database installation does not complete

Before migrating anything, I wanted to have a clean and empty Confluence installation. But already right after selecting the database, the (browser) setup failed with the following error message:

Hibernate operation: Could not execute query; bad SQL grammar

This turned out to be a problem with the MariaDB default settings. The default character set and collation were defined as utf8mb4 in /etc/mysql/mariadb.conf.d/50-server.cnf. If I'd have read the Atlassian documentation for Confluence Database Setup For MySQL, I'd have seen and known that this must be set to utf8 - not utf8mb4!

# MySQL/MariaDB default is Latin1, but in Debian we rather default to the full
# utf8 4-byte character set. See also client.cnf
#
# 2017-03-16 Claudio: Do not use utf8mb4 for Atlassian products!
#character-set-server  = utf8mb4
#collation-server      = utf8mb4_general_ci
character-set-server=utf8
collation-server=utf8_bin

There are other important MySQL settings mentioned on the Atlassian documentation. These were also applied.

After the character set and collation were adjusted and the database dropped and created from scratch, the setup ran through and it created all necessary tables.

 

Database cannot be migrated

At first I thought it would be the easiest (and fastest way) to use a dump from the MSSQL database and import it into the MariaDB/MySQL DB. But with all conversion tools I got errors. I tried it with MySQL Workbench's Migration Wizard, DBConvert for MSSQL & MySQL and HeidiSQL using database export/import. But all of them failed.

MySQL Workbench failed MSSQL migration

DBConvert was able to show me why:

DBConvert Foreign Key Cyclic Reference Error 

In the CONTENT table there's a foreign key pointing to the SPACES table. On the other hand in the SPACES table there's a foreign key pointing to the CONTENT table. See the dilemma? That's what's called a cycled reference, one cannot be created without the other. It results in "foreign key constraint fails" errors:

DBConvert Migration fails foreign key constraint

I tried to set MariaDB's GLOBAL VARIABLE to ignore the foreign key constraints (SET GLOBAL FOREIGN_KEY_CHECKS=0;) but this didn't help either.

Even with a pure data import (using data dump from a csv file) it didn't work.

At the end it turned out to be the wrong choice anyway, because the migration should be done with the Confluence backup/restore method. See below.

 

Restore process fails with invalid XML character

First considered as alternative migration process, I also tried Confluence's backup and restore method. In the old Confluence application a backup without attachments was created. This took around 30 minutes and created a zip file which contained the whole database in an xml format (entities.xml within the zip file). Once this zip file was transferred to the new Confluence server, the restore was launched from the browser (right after the database setup). But unfortunately the import failed with the following error:

Confluence Restore Progress Import Failed

Import failed. Check your server logs for more information.
com.atlassian.confluence.importexport.ImportExportException:  Unable to complete import: An invalid XML character (Unicode: 0xffff) was found in the CDATA section.

I came across some similar problems on the net and found a Perl replace command which removes this invalid character. In order to do that, the zip file must be unzipped and the command run directly on entities.xml. After this, the zip file needs to be recreated:

unzip xmlexport-20170316-145220-91.zip
perl -i -pe 's/\xef\xbf\xbf//g' entities.xml
rm xmlexport-20170316-145220-91.zip
zip xmlexport-20170316-145220-91.zip entities.xml exportDescriptor.properties plugin-data

With the newly zip file, started the restore again and this time it started importing. Until 4%...

 

Restore process fails with could not insert error

So once the invalid XML character error was gone, the next restore failed at 4% with the following error:

Confluence Restore Progress Import Failed 

Import failed. Check your server logs for more information.
com.atlassian.foncluence.importexport.ImportExportException:  Unable to complete import: Error while importing backup: could not insert: [com.atlassian.confluence.pages.Page#59705648]

This error message shown in the browser is not really helpful though. The Confluence application log ($CONFLUENCE_HOME/logs/atlassian-confluence.log) gives a better idea about the real problem:

Caused by: net.sf.hibernate.HibernateException: could not insert: [com.atlassian.confluence.pages.Page#59705648]
    at net.sf.hibernate.persister.EntityPersister.insert(EntityPersister.java:475)
    at net.sf.hibernate.persister.EntityPersister.insert(EntityPersister.java:436)
    at net.sf.hibernate.impl.ScheduledInsertion.execute(ScheduledInsertion.java:37)
    at net.sf.hibernate.impl.SessionImpl.execute(SessionImpl.java:2476)
    at net.sf.hibernate.impl.SessionImpl.executeAll(SessionImpl.java:2462)
    at net.sf.hibernate.impl.SessionImpl.execute(SessionImpl.java:2419)
    at net.sf.hibernate.impl.SessionImpl.flush(SessionImpl.java:2288)
    at com.atlassian.confluence.importexport.xmlimport.ImportProcessorContext.flushIfNeeded(ImportProcessorContext.java:246)
    at com.atlassian.confluence.importexport.xmlimport.ImportProcessorContext.objectImported(ImportProcessorContext.java:81)
    at com.atlassian.confluence.importexport.xmlimport.DefaultImportProcessor.persist(DefaultImportProcessor.java:50)
    at com.atlassian.confluence.importexport.xmlimport.DefaultImportProcessor.processObject(DefaultImportProcessor.java:37)
    at com.atlassian.confluence.importexport.xmlimport.parser.BackupParser.endElement(BackupParser.java:51)
    ... 44 more
Caused by: net.sf.hibernate.exception.GenericJDBCException: Could not execute JDBC batch update
    at net.sf.hibernate.exception.ErrorCodeConverter.handledNonSpecificException(ErrorCodeConverter.java:90)
    at net.sf.hibernate.exception.ErrorCodeConverter.convert(ErrorCodeConverter.java:79)
    at net.sf.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:29)
    at net.sf.hibernate.impl.BatcherImpl.convert(BatcherImpl.java:328)
    at net.sf.hibernate.impl.BatcherImpl.executeBatch(BatcherImpl.java:135)
    at net.sf.hibernate.impl.BatcherImpl.prepareStatement(BatcherImpl.java:61)
    at net.sf.hibernate.impl.BatcherImpl.prepareStatement(BatcherImpl.java:58)
    at net.sf.hibernate.impl.BatcherImpl.prepareBatchStatement(BatcherImpl.java:111)
    at net.sf.hibernate.persister.EntityPersister.insert(EntityPersister.java:454)
    ... 55 more
Caused by: java.sql.BatchUpdateException: Data truncation: Incorrect string value: '\xF0\x9F\x93\xBAF&...' for column 'BODY' at row 1
    at com.mysql.jdbc.SQLError.createBatchUpdateException(SQLError.java:1167)
    at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1778)
    at com.mysql.jdbc.PreparedStatement.executeBatchInternal(PreparedStatement.java:1262)
    at com.mysql.jdbc.StatementImpl.executeBatch(StatementImpl.java:958)
    at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:1723)
    at net.sf.hibernate.impl.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:54)
    at net.sf.hibernate.impl.BatcherImpl.executeBatch(BatcherImpl.java:128)
    ... 59 more
Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect string value: '\xF0\x9F\x93\xBAF&...' for column 'BODY' at row 1
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3971)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2501)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
    at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
    at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1756)
    ... 64 more

Again problems with some characters (incorrect string value)!

After this we contacted Atlassian's support and a few hours later we got an answer with a very helpful link: https://confluence.atlassian.com/confkb/incorrect-string-value-error-thrown-when-restoring-xml-backup-in-confluence-777009599.html

On this page, three different commands are used to fix the entities.xml file:

  1. A Java program (atlassian-xml-cleaner-0.1.jar) which is run on entities.xml and the results are saved in a second file entities-clean.xml
  2. A Perl command to remove 0xffff characters (the same command as I already ran above!)
  3. A Perl command to remove 0xfffe characters

For the sake of completeness again the same play with the zip file and the zipped entities.xml:

# unzip backup file
unzip xmlexport-20170316-145220-91.zip

# Download Atlassian xml cleaner
wget https://confluence.atlassian.com/confkb/files/777009599/777009596/1/1439839413460/atlassian-xml-cleaner-0.1.jar

# Fix xml character issues
/opt/atlassian/confluence/jre/bin/java -jar atlassian-xml-cleaner-0.1.jar entities.xml > entities-clean.xml
perl -i -pe 's/\xef\xbf\xbf//g' entities-clean.xml
perl -i -pe 's/\xef\xbf\xbe//g' entities-clean.xml
 
# Recreate zip file
rm entities.xml
mv entities-clean.xml entities.xml
zip xmlexport-20170316-145220-91new.zip entities.xml exportDescriptor.properties plugin-data

Then the restore was launched again with the new created zip file xmlexport-20170316-145220-91new.zip.


Restore process fails with out of memory error

The restore process was a bit over 20% complete, when my eye caught errors in the atlassian-confluence.log logfile (which I was tailing while running the restore process). The following errors appeared:

org.apache.velocity.exception.MethodInvocationException: Invocation of method 'getPrettyElapsedTime' in  class com.atlassian.confluence.importexport.actions.ImportLongRunningTask threw exception java.lang.OutOfMemoryError: Java heap space at /admin/longrunningtask-xml.vm[line 5, column 19]
        at org.apache.velocity.runtime.parser.node.ASTIdentifier.execute(ASTIdentifier.java:237)
        at org.apache.velocity.runtime.parser.node.ASTReference.execute(ASTReference.java:262)
        at org.apache.velocity.runtime.parser.node.ASTReference.render(ASTReference.java:342)
        at org.apache.velocity.runtime.parser.node.SimpleNode.render(SimpleNode.java:336)
        at org.apache.velocity.Template.merge(Template.java:328)
        at org.apache.velocity.Template.merge(Template.java:235)
        at com.opensymphony.webwork.dispatcher.VelocityResult.doExecute(VelocityResult.java:91)
[...]
Caused by: java.lang.OutOfMemoryError: Java heap space

The restore process receives a lot of data which is loaded into Tomcat's memory. The default sizing of 1GB was just too small and needed to be increased in Tomcat's setenv.sh:

grep Xmx /opt/atlassian/confluence/bin/setenv.sh
CATALINA_OPTS="$CATALINA_OPTS -Xms1024m -Xmx4096m -XX:MaxPermSize=384m -XX:+UseG1GC"

As you see I've set a new maximum heap size of 4GB.

Restarted the restore process and followed the memory usage of the Tomcat process closely. It went up to ~3500MB. Until the import was finally completed without additional errors!

Confluence Restore Progress Complete 

Hurray!

 

What all looks very easy in retrospective was a very time-consuming effort of several minds with a lot of try'n'errors. Especially the failed database migration was causing a lot of headache.
Special thanks go to Atlassian support for giving us the link we never stumbled upon (bad Google-luck I guess) and also to DBConvert which gave us some help trying to get around the foreign key constraint errors (even though we haven't bought the full software at that point)!

 

Unbound DNS not serving reverse lookup for internal addresses
Monday - Mar 20th 2017 - by - (0 comments)

If you've read some recent posts (Get Unbound DNS lookups working in Ubuntu 16.04 Xenial, understand SERVFAIL and Unbound DNS server behind a VIP - solving reply from unexpected source) you know that I've set up a high available Unbound DNS resolver/cacher for internal networks.

But just a few days before rolling out definitively for all internal servers a problem came across: The reverse DNS lookups didn't work.

With the Unbound DNS resolver defined as nameserver in /etc/resolv.conf:

$ host 192.168.253.153
Host 153.253.168.192.in-addr.arpa. not found: 3(NXDOMAIN)

$ host 10.161.206.153
Host 153.206.161.10.in-addr.arpa. not found: 3(NXDOMAIN)

The same command and the reverse lookup worked fine in a server still using the old standalone DNS servers. But why?

It's because of this, as found in the official documentation of unbound.conf:

The default zones are localhost, reverse 127.0.0.1 and ::1,  the  onion and  the AS112 zones. The AS112 zones are reverse DNS zones for private use and reserved IP addresses for which the  servers  on  the  internet cannot  provide correct answers. They are configured by default to give nxdomain (no reverse information) answers. The defaults can  be  turned off by specifying your own local-zone of that name, or using the 'nodefault' type.

So here we got the explanation. Unbound is by default configured to serve NXDOMAIN instead of serving the reserve dns information.

In order to change that behaviour, the internal IP addresses (see RFC1918), need to be defined in unbound.conf as local-zones in the server: section:

server:
        interface: 0.0.0.0
        interface-automatic: yes
        access-control: 10.0.0.0/16 allow
        access-control: 127.0.0.0/8 allow
        access-control: 172.16.0.0/12 allow
        access-control: 192.168.0.0/16 allow
        verbosity: 1
        domain-insecure: *
        root-hints: /var/lib/unbound/root.hints
        local-zone: "10.in-addr.arpa." nodefault
        local-zone: "16.172.in-addr.arpa." nodefault
        local-zone: "17.172.in-addr.arpa." nodefault
        local-zone: "18.172.in-addr.arpa." nodefault
        local-zone: "19.172.in-addr.arpa." nodefault
        local-zone: "20.172.in-addr.arpa." nodefault
        local-zone: "21.172.in-addr.arpa." nodefault
        local-zone: "22.172.in-addr.arpa." nodefault
        local-zone: "23.172.in-addr.arpa." nodefault
        local-zone: "24.172.in-addr.arpa." nodefault
        local-zone: "25.172.in-addr.arpa." nodefault
        local-zone: "26.172.in-addr.arpa." nodefault
        local-zone: "27.172.in-addr.arpa." nodefault
        local-zone: "28.172.in-addr.arpa." nodefault
        local-zone: "29.172.in-addr.arpa." nodefault
        local-zone: "30.172.in-addr.arpa." nodefault
        local-zone: "31.172.in-addr.arpa." nodefault
        local-zone: "168.192.in-addr.arpa." nodefault

forward-zone:
      name: "."
      forward-addr: domaincontroller1.example.com
      forward-addr: domaincontroller2.example.com
      forward-addr: domaincontroller3.example.com
      #forward-addr: 8.8.4.4        # Google
      #forward-addr: 8.8.8.8        # Google

Now that the RFC1918 networks are defined as local zones (and Unbound was restarted) I checked the reverse DNS lookup again on the same host using Unbound as DNS resolver:

$ host 192.168.253.153
153.253.168.192.in-addr.arpa domain name pointer olymp.localdomain.local.

$ host 10.161.206.153
153.253.168.192.in-addr.arpa domain name pointer domaincontroller1.example.com.


 


Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7539 Days
until Death of Computers
Why?