diff mbox

[5/6] qcow2: use a hash to look for entries in the L2 cache

Message ID fd912c784d51fbe9949de5cd1ee95e600c25ccc7.1430388393.git.berto@igalia.com
State New
Headers show

Commit Message

Alberto Garcia April 30, 2015, 10:11 a.m. UTC
The current cache algorithm traverses the array starting always from
the beginning, so the average number of comparisons needed to perform
a lookup is proportional to the size of the array.

By using a hash of the offset as the starting point, lookups are
faster and independent from the array size.

The hash is computed using the cluster number of the table, multiplied
by 4 to make it perform better when there are collisions.

In my tests, using a cache with 2048 entries, this reduces the average
number of comparisons per lookup from 430 to 2.5.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cache.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Stefan Hajnoczi May 6, 2015, 4:42 p.m. UTC | #1
On Thu, Apr 30, 2015 at 01:11:44PM +0300, Alberto Garcia wrote:
> By using a hash of the offset as the starting point, lookups are
> faster and independent from the array size.
> 
> The hash is computed using the cluster number of the table, multiplied
> by 4 to make it perform better when there are collisions.
...
> +    i = lookup_index = (offset / c->table_size * 4) % c->size;

The hash function is very weak.  It's better than always starting with
i=0 but mixes input bits very poorly.  Have you considered an integer
hash function so that more input bits affect the output?
http://burtleburtle.net/bob/hash/integer.html

Regarding collisions, the multiply-by-4 effectively reduces the hash
space by a factor of 4.  This *increases* the chance of collisions in
the hope that the extra slots will prevent chains from forming.

Using an integer hash function ought to be more resistant to input value
patterns and eliminate the need for multiply-by-4, because chains only
form when the table load is high.

Finally, the hash optimizes for the sparsely occupied table scenario.
If the table is loaded (dense) then there will be chains in any case.  A
full search will be necessary.

Feel free to leave the hash function as-is, I don't think it's a huge
issue either way.

Stefan
Alberto Garcia May 7, 2015, 8:23 a.m. UTC | #2
On Wed 06 May 2015 06:42:30 PM CEST, Stefan Hajnoczi wrote:

>> By using a hash of the offset as the starting point, lookups are
>> faster and independent from the array size.
>> 
>> The hash is computed using the cluster number of the table, multiplied
>> by 4 to make it perform better when there are collisions.
> ...
>> +    i = lookup_index = (offset / c->table_size * 4) % c->size;

> The hash function is very weak.  It's better than always starting with
> i=0 but mixes input bits very poorly.  Have you considered an integer
> hash function so that more input bits affect the output?
> http://burtleburtle.net/bob/hash/integer.html

I didn't try very hard to optimize the hash function because it doesn't
have a big impact, I just wanted something simple and a bit better than
i=0.

I did my tests with a disk image with preallocated metadata and a cache
size big enough for the whole disk. In that scenario I managed an
average of 2.5 comparisons per lookup with this function.

> Regarding collisions, the multiply-by-4 effectively reduces the hash
> space by a factor of 4.  This *increases* the chance of collisions in
> the hope that the extra slots will prevent chains from forming.

It does, but it keeps the average much lower (in my tests at least), it
went down from >100 to 2.5.

> Feel free to leave the hash function as-is, I don't think it's a huge
> issue either way.

I think I'll just leave it as-is for the time being, but thanks a lot
for your comments.

Berto
diff mbox

Patch

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index e1bba20..c0e0278 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -237,6 +237,7 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
     BDRVQcowState *s = bs->opaque;
     int i;
     int ret;
+    int lookup_index;
     uint64_t min_lru_counter = UINT64_MAX;
     int min_lru_index = -1;
 
@@ -244,7 +245,8 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
                           offset, read_from_disk);
 
     /* Check if the table is already cached */
-    for (i = 0; i < c->size; i++) {
+    i = lookup_index = (offset / c->table_size * 4) % c->size;
+    do {
         const Qcow2CachedTable *t = &c->entries[i];
         if (t->offset == offset) {
             goto found;
@@ -253,7 +255,10 @@  static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
             min_lru_counter = t->lru_counter;
             min_lru_index = i;
         }
-    }
+        if (++i == c->size) {
+            i = 0;
+        }
+    } while (i != lookup_index);
 
     if (min_lru_index == -1) {
         /* This can't happen in current synchronous code, but leave the check