Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

Freeing up disk space from CouchDB - do not forget the views!
Thursday - Jul 19th 2018 - by - (0 comments)

In the past few weeks I've seen a steady increase of disk usage of a CouchDB cluster I'm managing:

CouchDB Disk Usage steadily increases 

Time to free up some disk space! I already knew there was a "compaction" mechanism (comparable to the "vacuum" process in PostgreSQL) which will free up the used disk space by removing old revisions of data. But when I ran "compact" on the database using most disk space, it wasn't really helping.

Before I ran compact on the DB, there was a disk size of 10846902016 (Bytes):

# curl -q -s localhost:5984/bigdb
{
 "db_name": "bigdb",
 "update_seq": "13559532-g1AAAAHTeJzLYWBg4...",
 "sizes": {
  "file": 10846902016,
  "external": 3690681900,
  "active": 5355688382
 },
 "purge_seq": 0,
 "other": {
  "data_size": 3690681900
 },
 "doc_del_count": 46,
 "doc_count": 13559486,
 "disk_size": 10846902016,
 "disk_format_version": 6,
 "data_size": 5355688382,
 "compact_running": true,
 "cluster": {
  "q": 8,
  "n": 3,
  "w": 2,
  "r": 2
 },
 "instance_start_time": "0"
}

Running compact:

# curl -q -s -H "Content-Type: application/json" -X POST localhost:5984/bigdb/_compact
{"ok":true}

After this, I was able to see the status of the database compaction processes in the Fauxton UI:

CouchDB Database Compaction Progress
 

But once the compaction was completed, I found the disk size didn't change:

# curl -q -s localhost:5984/bigdb
{
 "db_name": "bigdb",
 "update_seq": "13559692-g1AAA...",
 "sizes": {
  "file": 10851612416,
  "external": 3690734442,
  "active": 5355768390
 },
 "purge_seq": 0,
 "other": {
  "data_size": 3690734442
 },
 "doc_del_count": 46,
 "doc_count": 13559646,
 "disk_size": 10851612416,
 "disk_format_version": 6,
 "data_size": 5355768390,
 "compact_running": false,
 "cluster": {
  "q": 8,
  "n": 3,
  "w": 2,
  "r": 2
 },
 "instance_start_time": "0"
}

Even worse: The compaction process used even more disk space. The opposite of what I expected!

I then checked on the file system level, where most disk space is being used and came across the following folders:

root@st-cdb01-p:/var/lib/couchdb# du -ksh shards/
15G    shards/

root@st-cdb01-p:/var/lib/couchdb# du -ksh .shards/
30G    .shards/

Note the dot in the second folder (.shards). According to the documentation, the ".shards" folder contains "views" and not "databases". So I manually checked the size of a view using the Fauxton UI:

CouchDB View Size 

Woah! Taking a look and comparing "Actual data size (bytes): 1,444,291,661" and "Data size on disk (bytes): 19,648,534,600" I was pretty sure I found the bad guy.

A compaction can also be run on a view (in this case "stats" is the view, can also be seen in the UI screenshot above):

root@couchdb:~# curl -q -s -H "Content-Type: application/json" -X POST localhost:5984/bigdb/_compact/stats
{"ok":true}

The compaction processes and their current progress can also be checked in the UI:

CouchDB View Compaction Progress
 

Once all of these processes were completed, 20GB of disk space were freed!

CouchDB Disk Usage after Views Compaction 

The change can also be seen in Fauxton:

CouchDB View Size after compaction

Some additional questions related to compaction and their answers below:

How did the compaction affect the cluster?
I ran the compaction on node 1 of a two node cluster. I could not see an immediate change of disk usage on the second node. I had to run the same compaction commands on node 2 to free disk space there, too.

Shouldn't auto compaction do this job?
That's what I thought, too. I verified that automatic compaction is enabled and this seems to be the case by default (Ubuntu 16.04, CouchDB 2.1):

root@couchdb:~# grep "\[daemons\]" -A 10 /opt/couchdb/etc/default.ini
[daemons]
index_server={couch_index_server, start_link, []}
external_manager={couch_external_manager, start_link, []}
query_servers={couch_proc_manager, start_link, []}
vhosts={couch_httpd_vhost, start_link, []}
httpd={couch_httpd, start_link, []}
uuids={couch_uuids, start, []}
auth_cache={couch_auth_cache, start_link, []}
os_daemons={couch_os_daemons, start_link, []}
compaction_daemon={couch_compaction_daemon, start_link, []}

The compaction_daemon is enabled and so are the settings:

root@couchdb:~# grep "\[compaction_daemon\]" -A 8 /opt/couchdb/etc/default.ini  
[compaction_daemon]
; The delay, in seconds, between each check for which database and view indexes
; need to be compacted.
check_interval = 300
; If a database or view index file is smaller then this value (in bytes),
; compaction will not happen. Very small files always have a very high
; fragmentation therefore it's not worth to compact them.
min_file_size = 131072

root@couchdb:~# grep "\[compactions\]" -A 78 /opt/couchdb/etc/default.ini  | egrep -v "^;"
[compactions]
_default = [{db_fragmentation, "70%"}, {view_fragmentation, "50%"}, {from, "00:00"}, {to, "04:00"}, {parallel_view_compaction, true}]

Note: I changed view_fragmentation from the default 60% to 50% and added the "from" and "to" timeslot.

So auto compaction should have been doing its job to free up disk space. According to the logs the compaction daemon did indeed run (on databases and views) but nothing was freed up.

TL;DR of this article?
Do not forget to compact your db views, too! Check their sizes (either in the UI or via CLI) and you should be able to determine where your disk space is getting wasted.

How can I make sure to run compact on all relevant databases and views?
For this purpose I created a script called compact_couchdb.sh. It runs through all the databases found in the addressed CouchDB. In each database, the views are detected. And the script compacts each database and each view of each database found.
The script can be found here (on Github): https://github.com/Napsty/scripts/blob/master/couchdb/compact_couchdb.sh

 

Add a comment

Show form to leave a comment

Comments (newest first):

No comments yet.

Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7098 Days
until Death of Computers
Why?