Message ID | fd912c784d51fbe9949de5cd1ee95e600c25ccc7.1430388393.git.berto@igalia.com |
---|---|
State | New |
Headers | show |
On Thu, Apr 30, 2015 at 01:11:44PM +0300, Alberto Garcia wrote: > By using a hash of the offset as the starting point, lookups are > faster and independent from the array size. > > The hash is computed using the cluster number of the table, multiplied > by 4 to make it perform better when there are collisions. ... > + i = lookup_index = (offset / c->table_size * 4) % c->size; The hash function is very weak. It's better than always starting with i=0 but mixes input bits very poorly. Have you considered an integer hash function so that more input bits affect the output? http://burtleburtle.net/bob/hash/integer.html Regarding collisions, the multiply-by-4 effectively reduces the hash space by a factor of 4. This *increases* the chance of collisions in the hope that the extra slots will prevent chains from forming. Using an integer hash function ought to be more resistant to input value patterns and eliminate the need for multiply-by-4, because chains only form when the table load is high. Finally, the hash optimizes for the sparsely occupied table scenario. If the table is loaded (dense) then there will be chains in any case. A full search will be necessary. Feel free to leave the hash function as-is, I don't think it's a huge issue either way. Stefan
On Wed 06 May 2015 06:42:30 PM CEST, Stefan Hajnoczi wrote: >> By using a hash of the offset as the starting point, lookups are >> faster and independent from the array size. >> >> The hash is computed using the cluster number of the table, multiplied >> by 4 to make it perform better when there are collisions. > ... >> + i = lookup_index = (offset / c->table_size * 4) % c->size; > The hash function is very weak. It's better than always starting with > i=0 but mixes input bits very poorly. Have you considered an integer > hash function so that more input bits affect the output? > http://burtleburtle.net/bob/hash/integer.html I didn't try very hard to optimize the hash function because it doesn't have a big impact, I just wanted something simple and a bit better than i=0. I did my tests with a disk image with preallocated metadata and a cache size big enough for the whole disk. In that scenario I managed an average of 2.5 comparisons per lookup with this function. > Regarding collisions, the multiply-by-4 effectively reduces the hash > space by a factor of 4. This *increases* the chance of collisions in > the hope that the extra slots will prevent chains from forming. It does, but it keeps the average much lower (in my tests at least), it went down from >100 to 2.5. > Feel free to leave the hash function as-is, I don't think it's a huge > issue either way. I think I'll just leave it as-is for the time being, but thanks a lot for your comments. Berto
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index e1bba20..c0e0278 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -237,6 +237,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, BDRVQcowState *s = bs->opaque; int i; int ret; + int lookup_index; uint64_t min_lru_counter = UINT64_MAX; int min_lru_index = -1; @@ -244,7 +245,8 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, offset, read_from_disk); /* Check if the table is already cached */ - for (i = 0; i < c->size; i++) { + i = lookup_index = (offset / c->table_size * 4) % c->size; + do { const Qcow2CachedTable *t = &c->entries[i]; if (t->offset == offset) { goto found; @@ -253,7 +255,10 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, min_lru_counter = t->lru_counter; min_lru_index = i; } - } + if (++i == c->size) { + i = 0; + } + } while (i != lookup_index); if (min_lru_index == -1) { /* This can't happen in current synchronous code, but leave the check
The current cache algorithm traverses the array starting always from the beginning, so the average number of comparisons needed to perform a lookup is proportional to the size of the array. By using a hash of the offset as the starting point, lookups are faster and independent from the array size. The hash is computed using the cluster number of the table, multiplied by 4 to make it perform better when there are collisions. In my tests, using a cache with 2048 entries, this reduces the average number of comparisons per lookup from 430 to 2.5. Signed-off-by: Alberto Garcia <berto@igalia.com> --- block/qcow2-cache.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)