Tuesday, May 31, 2016

Shrinking your Couchbase memory footprint with compression

Hi,

TL;DR: Compression will decrease your memory footprint and increase your memory residency.

An important and integral part of Couchbase server many people forget is that Couchbase is not only a fast general use document database with advanced querying of N1QL, it's also is Key/Value store altogether.

If it is the kind of operations you might need, and you can "pay" in terms of be able to get the data by only the key and not be able to index things than you might consider of compressing your JSON or object and save it into binary document in the database instead of just a JSON.

For instance, each bucket contains exactly 1 million documents which looks similar to this one:

Underscores are there in order to guaranty the size of 278 bytes

Each document contains 56 bytes of metadata, about 15 bytes of key size and 278 bytes of value in JSON format. (349bytes per document) - about 349,000,000 bytes of RAM which are 332.8Mb.
We can check the amount of bytes in the memory, of active vbuckets by using the following cbstats command:

./cbstats localhost:11210 -b compressed all | grep active_itm_memory 
vb_active_itm_memory:                        349000000

Compressed document takes about 107-108bytes, so 178,000,000 appx. or 169Mb.
in cbstats:

./cbstats localhost:11210 -b compressed all | grep active_itm_memory 
vb_active_itm_memory:                        178139161

Figures in the following screenshot are slightly different as there is some extra overhead or the Couchbase engine. 
The figures here are the actual volume that the bucket takes in the memory, not only data.



We can see here that the data compressed almost by the factor of 2! (349 vs 178).
meaning - reducing the amount of machines/memory needed by almost 50%.
And if you are not on 100% residence ratio - that method will surely increase it.

So wait! If I need half of the machines (on that use case), where is the catch?

Three things you must note here:
1) As described before, you cannot index compressed documents.
2) Creating the document you want to insert takes more time.
3) Reading the document takes longer as you need to decompress.

Creating the documents time is varying in a factor of 6.
I've used the best compression setting for worst case scenario.
From the tests I've run, the compression factor in gzip lib for Java doesn't really change much for the data in terms of time and footprint.
That machine is my laptop so not a server grade machine.

Uncompressed
Generating 1M documents took: 5002ms

Compressed
Generating 1M documents took: 31731ms

So how do you insert compressed documents?

  1. Create the stream
  2. Wrap it with BinaryDocument
  3. Insert it to Couchbase (observable)
Here is a snip of creating a compressed binary document and adding it to a collection:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStream gzipOut = new GZIPOutputStream(baos){{def.setLevel(Deflater.BEST_SPEED);}};

ObjectOutputStream objectOut = new ObjectOutputStream(gzipOut);
objectOut.writeObject(doc.content().toString());
objectOut.close();

byte[] bytes = baos.toByteArray();
ByteBuf toWrite = Unpooled.copiedBuffer(bytes);
BinaryDocument binDoc = BinaryDocument.create(key, toWrite);
docsToInsertZipped.add(binDoc);

Observable.from(docsToInsertZipped).flatMap(docBinary->compressedBucket.async().upsert(docBinary)).toBlocking().subscribe();

In order to read the data:
  1. Read (get) the document
  2. Uncompress the content
  3. Convert the byte buffer to string

BinaryDocument binaryDocument = compressedBucket.get("person::0000001",BinaryDocument.clas
byte[] data = new byte[binaryDocument.content().readableBytes()];

binaryDocument.content().readBytes(data);
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(data));
InputStreamReader reader = new InputStreamReader(gis);
BufferedReader buffered = new BufferedReader(reader);

String read;
while ((read = buffered.readLine()) != null) {
    System.out.println(read);
}


A much more complicated process than just get the document,


JsonDocument document = uncompressedBucket.get("person::0000001");
System.out.println(document.content().toString());

but it is faster, as in that code you don't need to serialize the bytes into JsonDocument.
Take into account that you might have to serialize it anyway, or put it behind wrappers.

That's it.
Now you have another tool you might use in your toolbox.













No comments:

Post a Comment