Elasticsearch: Shards fail to allocate due to maximum number of retries exceeded

Written by - 0 comments

Published on April 5th 2019 - Listed in ELK Elasticsearch Linux


A recent problem in our ELK stack this week had quite an (unexpected) impact. We experienced problems in our Docker environments (see Docker logging with GELF using tcp: The good, the bad, the ugly) and also forced myself to update the monitoring plugin check_es_system.

But once I got the Elasticsearch cluster running again, one problem remained: 5 shards were unable to be allocated/assigned:

admin@es01:~$ curl -s http://localhost:9200/_cat/shards?pretty -u elastic:pass  | egrep "(UNASSIGNED|INIT)"
docker-2019.04.03               4 r UNASSIGNED
docker-2019.04.03               0 r UNASSIGNED
datateam-2019.03.25             1 r UNASSIGNED
haproxy-2019.03.23              2 r UNASSIGNED
.security-6                     0 r UNASSIGNED

I already came across such a problem with shards not able to be assigned and was able to solve it by manually assigning these shards using the commands "allocate_replica" and "allocate_stale_primary" (see ElasticSearch cluster stays red, stuck unassigned shards not being assigned). But this time this didn't work:

admin@es01:~$ curl -u elastic:pass -X POST -H "Content-Type: application/json" "http://localhost:9200/_cluster/reroute?retry_failed=true" -d "{ \"commands\" : [ { \"allocate_replica\": { \"index\": \".security-6\", \"shard\": 0, \"node\": \"es01\" } } ]  }"
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [.security-6][0] on node {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-04-03T08:23:28.094Z], failed_attempts[5], delayed=false, details[failed shard on node [0o0eQXxcSJuWIFG2ohjwUg]: failed recovery, failure RecoveryFailedException[[.security-6][0]: Recovery failed from {es02}{t3GAvhY1SS2xZkt4U389jw}{jBEFGCmBQeOiYpLUF9zrgw}{192.168.66.12}{192.168.66.12:9300}{ml.machine_memory=31624278016, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[es02][192.168.66.12:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: ReceiveTimeoutTransportException[[es01][192.168.66.11:9300][internal:index/shard/recovery/prepare_translog] request_id [3025546] timed out after [900098ms]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.5.4] since this is equal-or-newer than the primary version [6.5.4])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [1.8tb], shard size: [0b], free after allocating shard: [1.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it)]"}],"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [.security-6][0] on node {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-04-03T08:23:28.094Z], failed_attempts[5], delayed=false, details[failed shard on node [0o0eQXxcSJuWIFG2ohjwUg]: failed recovery, failure RecoveryFailedException[[.security-6][0]: Recovery failed from {es02}{t3GAvhY1SS2xZkt4U389jw}{jBEFGCmBQeOiYpLUF9zrgw}{192.168.66.12}{192.168.66.12:9300}{ml.machine_memory=31624278016, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {es01}{0o0eQXxcSJuWIFG2ohjwUg}{SGXKCUwLR--8rV4yXEUZcQ}{192.168.66.11}{192.168.66.11:9300}{ml.machine_memory=31624278016, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[es02][192.168.66.12:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: ReceiveTimeoutTransportException[[es01][192.168.66.11:9300][internal:index/shard/recovery/prepare_translog] request_id [3025546] timed out after [900098ms]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.5.4] since this is equal-or-newer than the primary version [6.5.4])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [1.8tb], shard size: [0b], free after allocating shard: [1.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it)]"},"status":400}

Yes, it's quite a big error message. The important part is the following though:

reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry

As you can see in the command before, I already tried it with the "retry_failed" parameter, with the same error as a result.

That's one of these moment when you need to research and read the documentation. The relevant information can be found on the documentation's page about Cluster Reroute:

The cluster will attempt to allocate a shard a maximum of index.allocation.max_retries times in a row (defaults to 5), before giving up and leaving the shard unallocated.

 Let's check the current settings of that index (.security-6):

admin@es01:~$ curl -u elastic:pass "http://localhost:9200/.security-6/_settings?pretty"
{
  ".security-6" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "auto_expand_replicas" : "0-all",
        "provided_name" : ".security-6",
        "format" : "6",
        "creation_date" : "1533818873742",
        "analysis" : {
          "filter" : {
            "email" : {
              "type" : "pattern_capture",
              "preserve_original" : "true",
              "patterns" : [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)"
              ]
            }
          },
          "analyzer" : {
            "email" : {
              "filter" : [
                "email",
                "lowercase",
                "unique"
              ],
              "tokenizer" : "uax_url_email"
            }
          }
        },
        "priority" : "1000",
        "number_of_replicas" : "1",
        "uuid" : "gCuCQStxS-e9dKbuJqH-QQ",
        "version" : {
          "created" : "6030199",
          "upgraded" : "6050499"
        }
      }
    }
  }
}

The index's settings does not contain a "index.allocation.max_retries" value. This means the default (5) is used.

We can update the index's settings and set "index.allocation.max_retries" to a higher level:

admin@es01:~$ curl -u elastic:pass -H "Content-Type: application/json" -X PUT "http://localhost:9200/.security-6/_settings" -d "{ \"index.allocation.max_retries\": 6}"
{"acknowledged":true}

Verify that the new setting appears:

admin@es01:~$ curl -u elastic:pass "http://localhost:9200/.security-6/_settings?pretty"
{
  ".security-6" : {
    "settings" : {
      "index" : {
        "allocation" : {
          "max_retries" : "6"
        },

        "number_of_shards" : "1",
        "auto_expand_replicas" : "0-all",
        "provided_name" : ".security-6",
        "format" : "6",
        "creation_date" : "1533818873742",
        "analysis" : {
          "filter" : {
            "email" : {
              "type" : "pattern_capture",
              "preserve_original" : "true",
              "patterns" : [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)"
              ]
            }
          },
          "analyzer" : {
            "email" : {
              "filter" : [
                "email",
                "lowercase",
                "unique"
              ],
              "tokenizer" : "uax_url_email"
            }
          }
        },
        "priority" : "1000",
        "number_of_replicas" : "1",
        "uuid" : "gCuCQStxS-e9dKbuJqH-QQ",
        "version" : {
          "created" : "6030199",
          "upgraded" : "6050499"
        }
      }
    }
  }
}

Yep, it's there!

And now try to allocate the replica shard again:

admin@es01:~$ curl -u elastic:pass -X POST -H "Content-Type: application/json" "http://localhost:9200/_cluster/reroute?retry_failed=true" -d "{ \"commands\" : [ { \"allocate_replica\": { \"index\": \".security-6\", \"shard\": 0, \"node\": \"es01\" } } ]  }"
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"[allocate_replica] all copies of [.security-6][0] are already assigned. Use the move allocation command instead"}],"type":"illegal_argument_exception","reason":"[allocate_replica] all copies of [.security-6][0] are already assigned. Use the move allocation command instead"},"status":400}

Oh, it's already assigned? What happened?

admin@es01:~$ curl -s http://localhost:9200/_cat/shards?pretty -u elastic:pass | egrep ".security-6"
.security-6    0 p STARTED     15  55.3kb 192.168.66.12 es02
.security-6    0 r STARTED     15  39.6kb 192.168.66.11 es01

Seems that the index was automatically allocated, once the "index.allocation.max_retries" value was set to 6.

After I updated the index settings of the remaining unassigned replica shards, they were all automatically allocated:

admin@es01:~$ curl -s http://localhost:9200/_cat/shards?pretty -u elastic:pass | egrep "(UNASSIGNED|INIT)"
docker-2019.04.03   4 r INITIALIZING   192.168.66.11 es01
docker-2019.04.03   0 r INITIALIZING   192.168.66.11 es01

The monitoring plugin check_es_system showed the initializing shards during that time (no more unassigned shards discovered):

admin@es01:~$ /usr/lib/nagios/plugins/check_es_system.sh -H "192.168.66.12" -t status -u elastic -p pass
ES SYSTEM WARNING - Elasticsearch Cluster is yellow (2 nodes, 2 data nodes, 1752 shards, 0 relocating shards, 2 initializing shards, 0 unassigned shards, 6771000328 docs)|total_nodes=2;;;; data_nodes=2;;;; total_shards=1752;;;; relocating_shards=0;;;; initializing_shards=2;;;; unassigned_shards=0;;;; docs=6771000328;;;;

It took a few moments until the cluster was finally green again:

admin@es01:~$ /usr/lib/nagios/plugins/check_es_system.sh -H "192.168.66.12" -t status -u elastic -p pass
ES SYSTEM OK - Elasticsearch Cluster is green (2 nodes, 2 data nodes, 1754 shards, 6771208539 docs)|total_nodes=2;;;; data_nodes=2;;;; total_shards=1754;;;; relocating_shards=0;;;; initializing_shards=0;;;; unassigned_shards=0;;;; docs=6771208539;;;;


More recent articles: