Freeing up disk space from CouchDB - do not forget the views!

Written by - 3 comments

Published on - Listed in CouchDB Database DB Linux


In the past few weeks I've seen a steady increase of disk usage of a CouchDB cluster I'm managing:

CouchDB Disk Usage steadily increases

Time to free up some disk space! I already knew there was a "compaction" mechanism (comparable to the "vacuum" process in PostgreSQL) which will free up the used disk space by removing old revisions of data. But when I ran "compact" on the database using most disk space, it wasn't really helping.

Before I ran compact on the DB, there was a disk size of 10846902016 (Bytes):

# curl -q -s localhost:5984/bigdb
{
 "db_name": "bigdb",
 "update_seq": "13559532-g1AAAAHTeJzLYWBg4...",
 "sizes": {
  "file": 10846902016,
  "external": 3690681900,
  "active": 5355688382
 },
 "purge_seq": 0,
 "other": {
  "data_size": 3690681900
 },
 "doc_del_count": 46,
 "doc_count": 13559486,
 "disk_size": 10846902016,
 "disk_format_version": 6,
 "data_size": 5355688382,
 "compact_running": true,
 "cluster": {
  "q": 8,
  "n": 3,
  "w": 2,
  "r": 2
 },
 "instance_start_time": "0"
}

Running compact:

# curl -q -s -H "Content-Type: application/json" -X POST localhost:5984/bigdb/_compact
{"ok":true}

After this, I was able to see the status of the database compaction processes in the Fauxton UI:

CouchDB Database Compaction Progress

But once the compaction was completed, I found the disk size didn't change:

# curl -q -s localhost:5984/bigdb
{
 "db_name": "bigdb",
 "update_seq": "13559692-g1AAA...",
 "sizes": {
  "file": 10851612416,
  "external": 3690734442,
  "active": 5355768390
 },
 "purge_seq": 0,
 "other": {
  "data_size": 3690734442
 },
 "doc_del_count": 46,
 "doc_count": 13559646,
 "disk_size": 10851612416,
 "disk_format_version": 6,
 "data_size": 5355768390,
 "compact_running": false,
 "cluster": {
  "q": 8,
  "n": 3,
  "w": 2,
  "r": 2
 },
 "instance_start_time": "0"
}

Even worse: The compaction process used even more disk space. The opposite of what I expected!

I then checked on the file system level, where most disk space is being used and came across the following folders:

root@st-cdb01-p:/var/lib/couchdb# du -ksh shards/
15G    shards/

root@st-cdb01-p:/var/lib/couchdb# du -ksh .shards/
30G    .shards/

Note the dot in the second folder (.shards). According to the documentation, the ".shards" folder contains "views" and not "databases". So I manually checked the size of a view using the Fauxton UI:

CouchDB View Size

Woah! Taking a look and comparing "Actual data size (bytes): 1,444,291,661" and "Data size on disk (bytes): 19,648,534,600" I was pretty sure I found the bad guy.

A compaction can also be run on a view (in this case "stats" is the view, can also be seen in the UI screenshot above):

root@couchdb:~# curl -q -s -H "Content-Type: application/json" -X POST localhost:5984/bigdb/_compact/stats
{"ok":true}

The compaction processes and their current progress can also be checked in the UI:

CouchDB View Compaction Progress

Once all of these processes were completed, 20GB of disk space were freed!

CouchDB Disk Usage after Views Compaction

The change can also be seen in Fauxton:

CouchDB View Size after compaction

Some additional questions related to compaction and their answers below:

How did the compaction affect the cluster?
I ran the compaction on node 1 of a two node cluster. I could not see an immediate change of disk usage on the second node. I had to run the same compaction commands on node 2 to free disk space there, too.

Shouldn't auto compaction do this job?
That's what I thought, too. I verified that automatic compaction is enabled and this seems to be the case by default (Ubuntu 16.04, CouchDB 2.1):

root@couchdb:~# grep "\[daemons\]" -A 10 /opt/couchdb/etc/default.ini
[daemons]
index_server={couch_index_server, start_link, []}
external_manager={couch_external_manager, start_link, []}
query_servers={couch_proc_manager, start_link, []}
vhosts={couch_httpd_vhost, start_link, []}
httpd={couch_httpd, start_link, []}
uuids={couch_uuids, start, []}
auth_cache={couch_auth_cache, start_link, []}
os_daemons={couch_os_daemons, start_link, []}
compaction_daemon={couch_compaction_daemon, start_link, []}

The compaction_daemon is enabled and so are the settings:

root@couchdb:~# grep "\[compaction_daemon\]" -A 8 /opt/couchdb/etc/default.ini
[compaction_daemon]
; The delay, in seconds, between each check for which database and view indexes
; need to be compacted.
check_interval = 300
; If a database or view index file is smaller then this value (in bytes),
; compaction will not happen. Very small files always have a very high
; fragmentation therefore it's not worth to compact them.
min_file_size = 131072

root@couchdb:~# grep "\[compactions\]" -A 78 /opt/couchdb/etc/default.ini  | egrep -v "^;"
[compactions]
_default = [{db_fragmentation, "70%"}, {view_fragmentation, "50%"}, {from, "00:00"}, {to, "04:00"}, {parallel_view_compaction, true}]

Note: I changed view_fragmentation from the default 60% to 50% and added the "from" and "to" timeslot.

So auto compaction should have been doing its job to free up disk space. According to the logs the compaction daemon did indeed run (on databases and views) but nothing was freed up.

TL;DR of this article?
Do not forget to compact your db views, too! Check their sizes (either in the UI or via CLI) and you should be able to determine where your disk space is getting wasted.

How can I make sure to run compact on all relevant databases and views?
For this purpose I created a script called compact_couchdb.sh. It runs through all the databases found in the addressed CouchDB. In each database, the views are detected. And the script compacts each database and each view of each database found.
The script can be found here (on Github).

CouchDB as a Service?

You don't want to worry about privileges, backups, file system growth, memory allocation etc and just want to focus using CouchDB? Check out the dedicated CouchDB server hosting at Infiniroot!


Add a comment

Show form to leave a comment

Comments (newest first)

Josh from California wrote on Feb 20th, 2020:

Thank you so much for this tutorial! The view shards were filling up the disk on my VPS hosting my game server and causing issues because of it. This helped free up much needed space.

Thanks again!!!


ck from Switzerland wrote on Feb 5th, 2019:

Hello Carlos. The CouchDB instances I manage are currently running on 2.2. I'm sure there will be another patch day soon so I'll keep an eye on the CouchDB if auto compact works as it should in 2.3.
When I wrote the post, I 'think' auto compact kind of worked but only for the db itself, not for the views.


Carlos from Barcelona wrote on Feb 4th, 2019:

Supposedly in version 2.3 of CouchDB the automatic compaction works fine, but I can not make it work so I did the same as you, a script.

If you know how to make it work, I would appreciate it if you can help me.


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Icingaweb   Icingaweb2   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder   


Update cookies preferences