Googlebot freezes Apache and server load increases

Written by - 0 comments

Published on February 3rd 2011 - Listed in Linux Internet


Arrrrgghh!
This is pretty much the summary of my research for the last couple of days. For several days now I have a weird behavior of Apache where suddenly the load increases and some Apache child processes use up to 100% of the CPU.

Top shows that there are 3 Apache processes which use the most % of CPU:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5439 www-data  20   0  626m 170m  64m S   76  4.3  51:43.32 apache2
 4355 www-data  20   0  680m 173m  58m S   43  4.4  34:08.04 apache2
 3522 www-data  20   0  630m 205m  64m S   39  5.2  39:03.98 apache2

If we take a detailed look of open connections by using the lsof command, we can see the following:

# lsof -i :80
COMMAND   PID     USER   FD   TYPE     DEVICE SIZE NODE NAME
apache2  3522 www-data   26u  IPv6 1210315466       TCP server:www->crawl-66-249-71-78.googlebot.com:54107 (CLOSE_WAIT)

apache2  4355 www-data   39u  IPv6 1210322237       TCP server:www->crawl-66-249-66-136.googlebot.com:36722 (CLOSE_WAIT)

apache2  5439 www-data   26u  IPv6 1210335205       TCP server:www->crawl-66-249-66-136.googlebot.com:62305 (CLOSE_WAIT)

apache2  5439 www-data   30u  IPv6 1210345350       TCP server:www->crawl-66-249-66-136.googlebot.com:40885 (CLOSE_WAIT)

apache2 13904 www-data    3u  IPv6 1210044289       TCP *:www (LISTEN)

apache2 14633 www-data    3u  IPv6 1210044289       TCP *:www (LISTEN)

apache2 14633 www-data   28u  IPv6 1210440119       TCP server:www->195.188.250.137:17518 (ESTABLISHED)

apache2 16314     root    3u  IPv6 1210044289       TCP *:www (LISTEN)

Surprise, surprise. We find the same processes found in the top output again. And we also see that they're not listening to new http connections anymore (meanwhile 3 new child processes were spawned). But the old processes are still open due to a CLOSE_WAIT status between Apache and the Googlebot.

The problem now is: What can I (and anyone else who experiences this problem) do? By definition a CLOSE_WAIT means that the remote side has closed the connection, but the local process still kept it open. Why does it only happen with Googlebot (which could prove an improper CLOSE from the remote side)?
If anyone has a solution for that problem, please let me know. And no, blocking Googlebot is not an option.

As of now the only temporary solution is to kill the affected child processes. This is not dangerous since all other http connections are managed by the new spawned processes, but it is not nice (remember, killing is not nice).


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.