Freeing up disk space from CouchDB - do not forget the views!

Written by - 2 comments

Published on July 19th 2018 - Listed in CouchDB Database DB Linux


In the past few weeks I've seen a steady increase of disk usage of a CouchDB cluster I'm managing:

CouchDB Disk Usage steadily increases

Time to free up some disk space! I already knew there was a "compaction" mechanism (comparable to the "vacuum" process in PostgreSQL) which will free up the used disk space by removing old revisions of data. But when I ran "compact" on the database using most disk space, it wasn't really helping.

Before I ran compact on the DB, there was a disk size of 10846902016 (Bytes):

# curl -q -s localhost:5984/bigdb
{
 "db_name": "bigdb",
 "update_seq": "13559532-g1AAAAHTeJzLYWBg4...",
 "sizes": {
  "file": 10846902016,
  "external": 3690681900,
  "active": 5355688382
 },
 "purge_seq": 0,
 "other": {
  "data_size": 3690681900
 },
 "doc_del_count": 46,
 "doc_count": 13559486,
 "disk_size": 10846902016,
 "disk_format_version": 6,
 "data_size": 5355688382,
 "compact_running": true,
 "cluster": {
  "q": 8,
  "n": 3,
  "w": 2,
  "r": 2
 },
 "instance_start_time": "0"
}

Running compact:

# curl -q -s -H "Content-Type: application/json" -X POST localhost:5984/bigdb/_compact
{"ok":true}

After this, I was able to see the status of the database compaction processes in the Fauxton UI:

CouchDB Database Compaction Progress

But once the compaction was completed, I found the disk size didn't change:

# curl -q -s localhost:5984/bigdb
{
 "db_name": "bigdb",
 "update_seq": "13559692-g1AAA...",
 "sizes": {
  "file": 10851612416,
  "external": 3690734442,
  "active": 5355768390
 },
 "purge_seq": 0,
 "other": {
  "data_size": 3690734442
 },
 "doc_del_count": 46,
 "doc_count": 13559646,
 "disk_size": 10851612416,
 "disk_format_version": 6,
 "data_size": 5355768390,
 "compact_running": false,
 "cluster": {
  "q": 8,
  "n": 3,
  "w": 2,
  "r": 2
 },
 "instance_start_time": "0"
}

Even worse: The compaction process used even more disk space. The opposite of what I expected!

I then checked on the file system level, where most disk space is being used and came across the following folders:

root@st-cdb01-p:/var/lib/couchdb# du -ksh shards/
15G    shards/

root@st-cdb01-p:/var/lib/couchdb# du -ksh .shards/
30G    .shards/

Note the dot in the second folder (.shards). According to the documentation, the ".shards" folder contains "views" and not "databases". So I manually checked the size of a view using the Fauxton UI:

CouchDB View Size

Woah! Taking a look and comparing "Actual data size (bytes): 1,444,291,661" and "Data size on disk (bytes): 19,648,534,600" I was pretty sure I found the bad guy.

A compaction can also be run on a view (in this case "stats" is the view, can also be seen in the UI screenshot above):

root@couchdb:~# curl -q -s -H "Content-Type: application/json" -X POST localhost:5984/bigdb/_compact/stats
{"ok":true}

The compaction processes and their current progress can also be checked in the UI:

CouchDB View Compaction Progress

Once all of these processes were completed, 20GB of disk space were freed!

CouchDB Disk Usage after Views Compaction

The change can also be seen in Fauxton:

CouchDB View Size after compaction

Some additional questions related to compaction and their answers below:

How did the compaction affect the cluster?
I ran the compaction on node 1 of a two node cluster. I could not see an immediate change of disk usage on the second node. I had to run the same compaction commands on node 2 to free disk space there, too.

Shouldn't auto compaction do this job?
That's what I thought, too. I verified that automatic compaction is enabled and this seems to be the case by default (Ubuntu 16.04, CouchDB 2.1):

root@couchdb:~# grep "\[daemons\]" -A 10 /opt/couchdb/etc/default.ini
[daemons]
index_server={couch_index_server, start_link, []}
external_manager={couch_external_manager, start_link, []}
query_servers={couch_proc_manager, start_link, []}
vhosts={couch_httpd_vhost, start_link, []}
httpd={couch_httpd, start_link, []}
uuids={couch_uuids, start, []}
auth_cache={couch_auth_cache, start_link, []}
os_daemons={couch_os_daemons, start_link, []}
compaction_daemon={couch_compaction_daemon, start_link, []}

The compaction_daemon is enabled and so are the settings:

root@couchdb:~# grep "\[compaction_daemon\]" -A 8 /opt/couchdb/etc/default.ini
[compaction_daemon]
; The delay, in seconds, between each check for which database and view indexes
; need to be compacted.
check_interval = 300
; If a database or view index file is smaller then this value (in bytes),
; compaction will not happen. Very small files always have a very high
; fragmentation therefore it's not worth to compact them.
min_file_size = 131072

root@couchdb:~# grep "\[compactions\]" -A 78 /opt/couchdb/etc/default.ini  | egrep -v "^;"
[compactions]
_default = [{db_fragmentation, "70%"}, {view_fragmentation, "50%"}, {from, "00:00"}, {to, "04:00"}, {parallel_view_compaction, true}]

Note: I changed view_fragmentation from the default 60% to 50% and added the "from" and "to" timeslot.

So auto compaction should have been doing its job to free up disk space. According to the logs the compaction daemon did indeed run (on databases and views) but nothing was freed up.

TL;DR of this article?
Do not forget to compact your db views, too! Check their sizes (either in the UI or via CLI) and you should be able to determine where your disk space is getting wasted.

How can I make sure to run compact on all relevant databases and views?
For this purpose I created a script called compact_couchdb.sh. It runs through all the databases found in the addressed CouchDB. In each database, the views are detected. And the script compacts each database and each view of each database found.
The script can be found here (on Github).


Add a comment

Show form to leave a comment

Comments (newest first)

ck from Switzerland wrote on Feb 5th, 2019:

Hello Carlos. The CouchDB instances I manage are currently running on 2.2. I'm sure there will be another patch day soon so I'll keep an eye on the CouchDB if auto compact works as it should in 2.3.
When I wrote the post, I 'think' auto compact kind of worked but only for the db itself, not for the views.


Carlos from Barcelona wrote on Feb 4th, 2019:

Supposedly in version 2.3 of CouchDB the automatic compaction works fine, but I can not make it work so I did the same as you, a script.

If you know how to make it work, I would appreciate it if you can help me.