Project Description

In this third release the improvement of the perfomances of the famous library BeIT distrubuited memcached (http://code.google.com/p/beitmemcached/) client through techniques what asyncronous writings, productor-consumer with circular buffer or bounded buffer and lru (least-recently-used) cache and merging Write-Buffer (Write merging: new written data into an existing block are merged. Reduce stall for write (writeback) buffer being full. Improve memory efficiency). LRU Cache helps you how to use the buffer and how to limit or fixed the size of buffer and how to manage storing and retrieving process of the buffer which size is limited. 

 

Introduction

Often working with distributed cache (particularly with the library BeIT client memcached) I have noticed some proportional latencies to the speed of the net, that is the net is slow anymore more time goes by among two writings / readings.
Then, a first approach, to improve the latencies in writing is that to use asyncronous commands. The problem of this technique is the existence of the object in cache. That is if a reading immediately is sent after that of writing the risult it is a probable miss-cache (there is high probability when more slow the net).

The solution to this problem is to search directly in the buffer rather than wait until the endof write operations (quick but complicated solution).

  
What's in the project?

MemcachedClient.cs - Improvement only method Add, Get and KeyExists

BlockingQueue.cs - One Lock Bounded Blocking Queue. This queue is internally synchronized (thread-safe) and designed for one-many producers and one-many consumer threads.  This is ideal for pipelining or other consumer/producer needs.   Fast and thread safe on single or multiple cpu machines.
Consumer thread(s) will block on Dequeue operations until another thread performs a Enqueue  operation, at which point the first scheduled consumer thread will be unblocked and get the  current object.  Producer thread(s) will block on Enqueue operations until another consumer thread calls Dequeue to free a queue slot, at which point the first scheduled producer thread will be unblocked to finish its Enqueue operation.  No user code is needed to handle this "ping-pong" between locking/unlocking consumers and producers. (http://www.codeproject.com/KB/recipes/boundedblockingqueue.aspx)

Cache.cs,SimpleMapCache,SimpleLRUCache.cs: Caching is an extremely useful concept to improve performance. There are many commercial and open-source caching packages available. This is fine, but sometimes we just want a quick and simple solution - the kind of solution that involves only a couple of classes which we can insert directly into our own code. 

First, we define a caching interface so that later on, if we change our mind, we can use a different caching strategy or hook in to a more sophisticated caching package. First, we define a caching interface so that later on, if we change our mind, we can use a different caching strategy or hook in to a more sophisticated caching package.

For example, if we are retrieving content from a database to display in a Web page, it is often not necessary that users see the absolute latest data every time. If we are getting 100 queries per second without caching, even if we were to only cache for one second we would reduce the number of queries by 100. We have implemented the least recently used (LRU) algorithm for caching, which discards the least recently used items first. This requires keeping track of what was used when. (http://lucene.apache.org/lucene.net/)

 
Benchmark on Intel Core2 Duo @1.60Ghz 2MB L2 cache, 3GB RAM

 

Test Time
Insert (Ideal) 00:00:00
Insert (DCache) 00:00:02.3437500(100x slower)
Fetch (Ideal) 00:00:00.0156250
Fetch (DCache) 00:02:13.5312500(100x slower)
RemoveAt (Ideal) 00:00:00.0468750
RemoveAt (DCache) 00:00:01.5156250(99x slower)

and after the improvement...

Insert (DCache) 00:00:00.015625(99x faster)
Fetch (DCache)  00:00:03.2968750(97x faster)

and with merging write buffer...

Fetch (DCache)  00:00:00.2968750(99x faster)


The results of the tests have been performed for the Insert and RemoveAt on 10000 objects; and 100.000 objects for Fetch.

The Insert and Fetch in distribuited cache result a slower respectively: 100x and 99x slower!.
You have an improvement of speed on the method Fetch (99x) and Insert (99x) after the optimization.

Last edited Jan 16, 2011 at 10:13 AM by Isterop, version 13